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Synthetic Peptides And Uses Therefore. 
FIELD OF THE INVENTION 

THIS INVENTION relates generally to agents for modulating immune responses. 
More particularly, the present invention relates to a synthetic polypeptide comprising a 
5 plurality of different segments of a parent polypeptide, wherein the segments are linked to 
each other such that one or more functions of the parent polypeptide are impeded, 
abrogated or otherwise altered and such that the synthetic polypeptide, when introduced 
into a suitable host, can elicit an immune response against the parent polypeptide. The 
invention also relates to synthetic polynucleotides encoding the synthetic polypeptides and 
10 to synthetic constructs comprising these polynucleotides. The invention further relates to 
the use of the polypeptides and polynucleotides of the invention in compositions for 
modulating imocnune responses. The invention also extends to methods of using such 
compositions for prophylactic and/or therapeutic purposes. 

Bibliographic details of various publications referred to in this specification are 
15 collected at the end of the description. 

BACKGROUND OF THE INVENTION 

The modem reductionist approach to vaccine and therapy development has been 
pursued for a nimiber of decades and attempts to focus only on those parts of pathogens or 
of cancer proteins which are relevant to the immune system. To date the performance of 
20 this approach has been relatively poor considering the vigorous research carried out and 
the number of effective vaccines and therapies that it has produced. This approach is still 
being actively pursued, however, despite its poor performance because vaccines developed 
using this approach are often extremely safe and because only by completely 
understanding the inunune system can new vaccine strategies be developed. 

25 One area that has benefited greatly from research efforts is knowledge about how 

the adaptive immune system operates and more specifically how T and B cells learn to 
recognise specific parts of pathogens and cancers. T cells are mainly involved in cell- 
mediated immunity whereas B cells are involved in the generation of antibody-mediated 
immunity. The two most important types of T cells involved in adaptive cellular immunity 
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are ap CD8"^ cytotoxic T lymphocytes (CTL) and CD4"^ T helper lymphocytes. CTL are 
important mediators of cellular immunity against many viruses, tumours, some bacteria 
and some parasites because they are able to kill infected cells directly and secrete various 
factors which can have powerfiil effects on the spread of infectious organisms. CTLs 
5 recognise epitopes derived from foreign intracellular proteins, which are 8-10 amino acids 
long and which are presented by class I major histocompatibility complex (MHC) 
molecules (in humans called human lymphocyte antigens - HLAs) (Jardetzky et aL, 1991; 
Fremont et aL^ 1992; Rotzschke et al, 1990). T helper cells enhance and regulate CTL 
responses and are necessary for the establishment of long-lived memory CTL. They also 

10 inhibit infectious organisms by secreting cytokines such as IFN-y. T helper cells recognise 
epitopes derived mostly from extracellular proteins which are 12-25 amino acids long and 
which are presented by class U MHC molecules (Chicz et ah, 1993; Newcomb et al, 
1993). B cells, or more specifically the antibodies they secrete, are important mediators in 
the control and clearance of mostly extracellular organisms. Antibodies recognise mainly 

15 conformational determinants on the surface of organisms, for example, although 
sometimes they may recognise short linear determinants. 

Despite significant advances towards xmderstanding how T and linear B cell 
epitopes are processed and presented to the immune system, the fiiU potential of epitope- 
based vaccines has not been ftiUy exploited. The main reason for this is the large number 
20 of different T cell epitopes, which have to be included into such vaccines to cover the 
extreme HLA polymorphism in the human population. The hxmian HLA diversity is one of 
the main reasons why whole pathogen vaccines frequently provide better population 
coverage than subunit or peptide-based vaccine strategies. There is a range of epitope- 
based strategies though which have tried to solve this problem, e,g., peptide blends, peptide 
conjugates and polyepitope vaccines (ie comprising strings of multiple epitopes) (Dyall et 
al, 1995; Thomson et al, 1996; Thomson et al, 1998; Thomson et al, 1998). These 
approaches however will always be sub optimal not only because of the slow pace of 
epitope characterisation but also, because it is virtually impossible for them to cover every 
existing HLA polymorphism in the population. A number of strategies have sought to 
avoid both problems by not identifying epitopes and instead incorporating larger amounts 
of sequence information e.g,^ approaches using whole genes or proteins and approaches 
that mix multiple protein or gene sequences together. The proteins used by these strategies 
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however sometimes still function and therefore can compromise vaccine safety e.g,^ whole 
cancer proteins. Alternative strategies have tried to improve the safety of vaccines by 
fragmenting the genes and expressing them either separately or as complex mixtures e,g,, 
library DNA immunisation or by ligating such fragments back together. These approaches 
5 are still sub-optimal because they are too complex, generate poor levels of iromunity 
cannot guarantee that all proteins no longer function and/or that all j&agments are present, 
which compromises substantially complete immunological coverage. 

The lack of a safe and* efficient vaccine strategy that can provide substantially 
complete immunological coverage is an important problem, especially when trying to 

10 develop vaccines against rapidly mutating and persistent viruses such as HIV and hepatitis 
C viras, because partial population coverage could allow vaccine-resistant pathogens to re- 
emerge in the future. Hxmian immunodeficiency vims (HP/) is an RNA lentivirus vims 
approximately 9 kb in length, which infects CD4"^ T cells, causing T cell decline and AIDS 
typically 3-8 years after infection. It is currently the most serious human viral infection, 

15 evidenced by the nimiber of people currently infected with HEV or who have died from 
AIDS, estimated by the World Health Organisation (WHO) and UNAIDS in their AIDS 
epidemic update (Deceipber 1999) to be 33.6 and 16.3 nadllion people, respectively. The 
spread of HTV is also now increasing fastest in areas of the world where over half of the 
human population reside, hence an effective vaccine is desperately needed to curb the 

20 spread of this epidemic. Despite the urgency, an effective vaccine for HIV is still some 
way off because of delays in defining the correlates of iromune protection, lack of a 
suitable animal model, existence of up to 8 different subtypes of HIV and a high HIV 
mutation rate. 

A significant amount of research has been carried out to try and develop a vaccine 
25 capable of generating neutralising antibody responses that can protect against field isolates 
of HIV. Despite these efforts, it is now clear that the variability, instability and 
inaccessibility of critical determinants on the HIV envelope protein will make it extremely 
difficult and perhaps impossible to develop such a vaccine (Kwong et al.^ 1998). The 
limited ability of antibodies to block HIV infection is also supported by the observation 
30 that development of AIDS correlates primarily with a reduction in CTL responsiveness to 
HIV and not to altered antibody levels (Ogg et a/., 1998). Hence CTL-mediated and not 
antibody-mediated responses appear to be critical for maintaining the asymptomatic state 
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in vivo. There is also some evidence to suggest that pre-existing HIV-specific CTL 
responses can block the establishment of a latent HIV infection. This evidence comes from 
•a mmiber of cases where individuals have generated HIV-specific CTL responses without 
becoming infected and appear to be protected from establishing latent HIV infections 
5 despite repeated virus exposure (Rowland-Jones et al, 1995; Parmiaai 1998), Taken 
together, these observations suggest that a vaccine capable of generating a broad range of 
strong CTL responses may be able to stop individuals from becoming latently infected 
with HIV or at least allow infected individuals to remain asymptomatic for life. Virtually 
aU of the candidate HIV vaccines developed to date have been derived from subtype B 

10 HTV proteins (westem world subtype) whereas the majority of the HIV infections 
worldwide are caused by subtypes A/E or C (E aad A are similar except in the envelop 
protein)(referred to as developing world subtypes). Hence existing candidate vaccines may 
not be suitable for the more common HIV subtypes. Recently, there has been some 
evidence tihiat B subtype vaccines may be partially effective against other conmaon HIV 

15 subtypes (Rowland-Jones et ah, 1998). Accordingly, the desirability of a vaccine still 
remains, whose effectiveness is substantially complete against all isolates of all strains of 
HIV. 
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SUMMARY OF THE INVENTION 

The present invention is predicated in part on a novel strategy for enhancing the 
efficacy of an immunopotentiating composition. This strategy involves utilising the 
sequence information of a parent polypeptide to produce a synthetic polypeptide that 
5 comprises a plurality of different segments of the parent polypeptide, which are linked 
sequentially together in a different arrangement relative to that of the parent polypeptide. 
As a result of this change in relationships the sequence of the linked segments in the 
synthetic polypeptide is different to a sequence contained within the parent polypeptide. As 
more fully described hereinafter, the present strategy is used advantageously to cause 
10 significant disruption to the stmcture and/or function of the parent polypeptide while 
minimising the destruction of potentially useful epitopes encoded by the parent 
polypeptide. 

Thus, in one aspect of the present invention, there is provided a synthetic 
polypeptide comprising a plurality of different segments of at least one parent polypeptide, 
15 wherein the segments are linked together in a different relationship relative to tiieir linkage 
in the at least one parent polypeptide. 

In one embodiment, the synthetic polypeptide consists essentially of different 
segments of a single parent polypeptide. 

In an altemate embodiment, the synthetic polypeptide consists essentially of 
20 different segments of a plurality of different parent polypeptides. 

Suitably, said segments in said synthetic polypeptide are linked sequentially in a 
different order or arrangement relative to that of corresponding segments in said at least 
one parent polypeptide. 

Preferably, at least one of said segments comprises partial sequence identity or 
25 homology to one or more other said segments. The sequence identity or homology is 
preferably contained at one or both ends of said at least one segment. 

In another aspect, Ihe invention resides in a synthetic polynucleotide encoding the 
synthetic polypeptide as broadly described above. 
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According to yet another aspect, the invention contenaplates a synthetic construct 
comprising a said polynucleotide as broadly described above that is operably linked to a 
regulatory polynucleotide. 

In a further aspect of the invention, there is provided a method for producing a 
5 synthetic polynucleotide as broadly described above, comprising: 

- linking together in the same reading frame a plurality of nucleic acid sequences 
encoding different segments of at least one parent polypeptide to form a synthetic 
polynucleotide whose sequence encodes said segments linked together in a different 
relationship relative to their linkage in the at least one parent polypeptide. 

10 Preferably, the method further comprises fragmenting the sequence of a respective 

parent polypeptide into fragments and linking said fragments together in a different 
relationship relative to their linkage in said parent polypeptide sequence. In a preferred 
embodiment of this type, the fragments are randomly linked together. 

Suitably, the method further comprises reverse translating the sequence of a 
15 respective parent polypeptide or a segment thereof to provide a nucleic acid sequence 
encoding said parent polypeptide or said segment. In a preferred embodiment of this type, 
an amino acid of said parent polypeptide sequence is reverse translated to provide a codon, 
which has higher translational efficiency than other synonymous codons in a cell of 
interest. Suitably, an amino acid of said parent polypeptide sequence is reverse translated 
20 to provide a codon which, in the context of adjacent or local sequence elements, has a 
lower propensity of forming an imdesirable sequence (e.g.^ a paliadromic sequence or a 
duplicated sequence) that is refractory to the execution of a task (e.g., cloning or 
sequencing). 

In another aspect, the invention encompasses a computer program product for 
25 designing the sequence of a synthetic polypeptide as broadly described above, comprising: 

— code that receives as input the sequence of at least one parent polypeptide; 

— code that fragments the sequence of a respective parent polypeptide into 
fragments; 
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— code that links together said fragments in a different relationship relative to their 
linkage in said parent polypeptide sequence; and 

— a computer readable medium that stores the codes. 

In yet another aspect, the invention provides a computer program product for 
5 designing the sequence of a synthetic polynucleotide as broadly described above, 
comprising: 

— code that receives as input the sequence of at least one parent polypeptide; 

~- code that fragments the sequence of a respective parent polypeptide into 
fragments; 

10 — code that reverse translates the sequence of a respective fragment to provide a 

nucleic acid sequence encoding said fragment; 

— code that links together in the same reading frame each said nucleic acid 
sequence to provide a polynucleotide sequence that codes for a polypeptide sequence in 
which said fragments are linked together in a different relationship relative to their 

15 linkage in the at least one parent polypeptide sequence; and 

— a computer readable medium that stores the codes. 

In still yet another aspect^, the invention provides a computer for designing the 
sequence of a synthetic polypeptide as broadly described above, wherein said computer 
comprises: 

20 (a) a machine-readable data storage medium comprising a data storage material 

encoded with machine-readable data, wherein said machine-readable data comprise the 
sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

25 (c) a central-processing unit coupled to said working memory and to said machine- 

readable data storage medium, for processing said machine readable data to provide said 
synthetic polypeptide sequence; and 

(d) an output hardware coupled to sdd central processing unit, for receiving said 
synthetic polypeptide sequence. 
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In a preferred embodiment, the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments and 
linking together said fragments in a different relationship relative to their linkage in the 
sequence of said parent polypeptide. 

5 In still yet another aspect, the invention resides in a computer for designing the 

sequence of a synthetic polynucleotide as broadly described above, wherein said computer 
comprises: 

(a) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said machine-readable data comprise the 

1 0 sequence of at least one parent polypeptide; 

(b) a working memory for storing instmctions for processing said machine-readable 
data; 

(c) a central-processing unit coupled to said working memory and to said machine- 
readable data storage medium, for processing said machine readable data to provide said 

15 synthetic polynucleotide sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polynucleotide sequence. 

In a preferred embodiment, the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments, 
20 reverse translating the sequence of a respective fragment to provide a nucleic acid 
sequence encoding said fragment and linking together in the same reading frame each said 
nucleic acid sequence to provide a polynucleotide sequence that codes for a polypeptide 
sequence in which said fragments are linked together in a different relationship relative to 
their linkage in the at least one parent polypeptide sequence. 

25 According to another aspect, the invention contemplates a composition, 

comprising an immunopotentiating agent selected from the group consisting of a synthetic 
polypeptide as broadly described above, a synthetic polynucleotide as broadly described 
above and a synthetic construct as broadly described above, together with a 
pharmaceutically acceptable carrier. 
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The composition may optionally comprise an adjuvant. 

In a further aspect, the invention encompasses a method for modulating an 
immune response, which response is preferably directed against a pathogen or a cancer, 
comprising administering to a patient in need of such treatment an effective amoxmt of an 
5 immunopotentiating agent selected from the group consisting of a synthetic polypeptide as 
broadly described above, a synthetic polynucleotide as broadly described above and a 
synthetic construct as broadly described above, or a composition as broadly described 
above. 

According to still a fbrther aspect of the invention, there is provided a method for 
10 treatment and/or prophylaxis of a disease or condition, comprising administering to a 
patient in need of such treatment an effective amount of an immxmopotentiating agent 
selected from the group consisting of a synthetic polypeptide as broadly described above, a 
synthetic polynucleotide as broadly described above and a synthetic construct as broadly 
described above, or a composition as broadly described above. 

15 The invention also encompasses the use of the synthetic polypeptide, the synthetic 

polynucleotide and the synthetic construct as broadly described above in the study, and 
modulation of immxme responses. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a diagrammatic representation showing the niimber of people living 
with AIDS in 199^8 in various parts of the world and most prevalent HIV clades in these 
regions. Estimates generated by UNAIDS. 

5 Figure 2 is a graphical representation showing trends in the incidence of the 

common HIV clades and estimates for the future. Graph from the Intemational Aids 
Vaccine Initiative (lAVT). 

Figure 3 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV gag [SEQ ID NO: 1] used for the construction of an 
10 embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV gag protein from the HTV Molecular Immimology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. PubUsher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

15 Figure 4 is a diagrammatic representation showing overlapping segments of a 

parent polypeptide sequence for HIV pol [SEQ ID NO: 2] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV pol protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 

20 Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR98-485. 

Figure 5 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HTV vif [SEQ ID NO: 3] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
25 consensus sequences for the HIV vif protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR98-485. 
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Figure 6 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV vpr [SEQ ID NO: 4] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV vpr protein from the HIV Molecular Immunology 
5 Database 1997, Editors Bette Korber, John Moore, Cristiaii Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 7 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV tat [SEQ ID NO: 5] used for the construction of an 
10 embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV tat protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

15 Figure 8 is a diagrammatic representation showing overlapping segments of a 

parent polypeptide sequence for HIV rev [SEQ ID NO: 6] used for the construction of an 
embodiment of an HIV Savine. Also shown are the aligimients of common HIV clade 
consensus sequences for the HIV rev protein from the HTV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
20 Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 9 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV vpu [SEQ ID NO: 7] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HTV clade 
consensus sequences for the HTV vpu protein from the HTV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 10 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV env [SEQ ID NO: 8] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of conamon HIV clade 
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consensus sequences for the HIV env protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

5 Figure 11 is a diagrammatic representation showing overlapping segments of a 

parent polypeptide sequence for HIV nef [SEQ ID NO: 9] used for the construction of an 
embodiment of an HTV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV nef protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
10 Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 12 is a diagrammatic representation depicting the systematic segmentation 
of the designed degenerate consensus sequences for each HIV protein and the reverse 
translation of each segment into a DNA sequence. Also shown is the number of segments 

15 used during random rearrangement and amino acids that were removed. Amino acids 
surroimded by an open square were removed from the design, because degenerate codons 
to cater for the desired amino acid combination required too many degenerate bases to 
comply with the incorporation of degenerate sequence mles outlined in the description of 
the invention herein. Amino acids surrounded by an open circle were removed only in the 

20 segment concerned mainly because they were coded for in an oligonucleotide overlap 
region. Amino acids marked with an asterisk were designed differently in one fragment 
compared to the corresponding overlap region (see tat gene) 

Figure 13 is a diagrammatic representation showing the first and second most 
frequently used codons in mammals used to reverse translate HTV protein segments. Also 
25 shown are all first and second most frequently used degenerate codons for two amino acids 
where only one base is varied. Codons used where more than one base was varied were 
worked out in each case by comparing all the codons for each amino acid. The lUPAC 
codes for degenerate bases are also shown. 

Figure 14 illustrates the construction plan for the HIV Savine showing the 
30 approximate sizes of the suhcassettes, cassettes and full-length Savine cDNA and the 
restriction sites involved in joining them together. Also shown are the extra sequences 
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added onto each subcassette during their design and a brief description of how the 
subcassettes, cassettes and full length cDNA were constructed and transferred into 
appropriate DNA plasmids. Description of full length construction: pA was cleaved with 
XlioVSall and cloned into XIiol arms of the B cassette; pAB was cleaved with Xhol and 
5 cloned into Xhol arms of the C cassette; full length construct is excisable with either 
Xbal/BamHI at the 5' end or Bglll at the 3' end. Options for excising cassettes: A) 
Xbal/BamHL at the 5' end, BgHUXhol at the 3' end; B) XbaVBamm at the 5' end, 
BglWSan. at the 3' end; C) XbaVBamHl at the 5' end, BglWSaK at the 3' end. Cleaving 
plasmid vectors: pDNAVacc is cleavable with XbaVXhol (DNA vaccination); pBCB07 or 
10 pTK7.5 vectors are cleavable with BamHUSaK (Recombinant Vaccinia); pAvipox vector 
pAF09 is cleavable ^m^BamHl/SaK (Recombinant Avipox). 

Figure 15 shows the full length DNA (17253 bp) and protein sequence (5742 aas) 
of the BDTV Savine constmct. Fragment boundaries are shown, together with the position of 
each fragment in each designed HTV protein, fragment number (in brackets), spacer 

15 residues (two alanine residues) and which jfragment the spacer was for (open boxes and 
arrows). The location of residual restriction site joining sequences corresponding to 
subcassette or cassette boundaries (shaded boxes) are also shown, along with start and stop 
codons, Kozak sequence, the location of the murine influenza vims CTL epitope sequence 
(near the 3' end), important restriction sites at each end and the position of each degenerate 

20 amino acid (indicated by 'X'). 

Figure 16 depicts the layout and position of oligonucleotides in the designed DNA 
sequence for subcassette Al. The sequences which armeal to the short amplification 
oligonucleotides are indicated by hatched boxes and the position of oligonucleotide 
overlap regions are dark shaded. 

25 Figure 17: Panel (a) depicts the stepwise asymmetric PGR of the two halves of 

subcassette Al (lanes 2-5 and 7-9, respectively) and final splicing together by SOEing 
(lane 10). DNA standards in lane 1 are pUClS digested with iS'aM3AI. Panel (b) shows the 
stepwise ligation-mediated joining and PGR amplification of each cassette as indicated. 
DNA standards in lane 1 are SPPl cut with EcoRL, 

30 Figure 18: Panel (a) shows summary of the construction of the DNA vaccine 

plasmids that express one HIV Savine cassette. Panel (b) shows a smmnary of the 
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construction of the plasmids used for marker rescue recombination to generate Vaccinia 
viruses expressing one HIV Savine cassette. Panel (c) shows a summary of the 
constmction of the DNA vaccine plasmids which each express a version of the fulHength 
HIV Savine cDNA 

5 Figure 19 shows restimulation of HIV specijSc polyclonal CTL responses from 

three HIV-infected patients by the HIV Savine constructs. PBMCs from three different 
patients were restimulated for 7 days by infection with Vaccinia virus pools expressing the 
HIV Savine cassettes: Pool 1 included W-ACl and VV-BCl; Pool 2 included VV-AC2, 
VV-BC2 andW-CC2. The restimulated PBMCs were then mixed with autologous LCLs 
10 (effector to target ratio of 50:1), which were either iminfected or infected with either 
Vaccinia viruses expressing the HIV proteins gag (VV-gag), env (VV-env) or pol (W- 
pol), VV- HIV Savine pools 1 (Ught bars) or 2 (dark bars) or a control Vaccinia virus (W- 
Lac) and the amount of ^^Cr released used to determine percent specific lysis. K562 cells 
were used to determine the level of NK ceU-mediated killing in their stimulated culture. 

15 Figure 20 is a diagrammatic representation showing CD4+ proliferation of 

PBMCs from HIV~1 infected patients restimulated with either Pooll or Pool2 of the HIV-1 
Savine. Briefly PBMCs were stained with CFSE and culture for 6 days with or without 
Ws encoding either pooll or pool2 of the HIV-1 Savine. Restimulated Cells were then 
labelled with antibodies and analysed by FACS. 

20 Figure 21 is a graphical representation showing the CTL response in mice 

vaccinated with the HIV Savine. C57BL6 mice were irmnunised with the HIV-1 Savine 
DNA vaccine comprising the six plasmids described in Figure 18a (100 ptg total DNA was 
given as 50 /ig/leg i.m.). One week later Poxviruses (1x10^ pfti) comprising Pool 1 of the 
HIV-1 Savine were used to boost the immime responses. Three weeks later splenocytes 

25 from these mice were restimulated with W-Pool 1 or W-Pool 2 for 5 days and the 
resultant effectors used in a ^^Cr release c3^otoxicity assay against targets infected with 
CTRW, W-pools or W expressing the natural antigens from HIV-1. 

Figure 22 shows immune responses of HIV Immune Macaques (vaccinated with 
recombinant FPV expressing gag-pol and challenged with HIV-1 2 years prior to 
30 experiment). Monkeys 1 and 2 were immunised once at day 0 with W Savine pool 1 
(Three Ws which together express the entire HIV Savine ). Monkey 3 was iimnunised 
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twice with FPV-gag-pol z.e.. Day 0 is 3 weeks after first FPV-gag-pol immunisation. A) 
IFN-y detection by ELISPOT of whole blood (0,5 mL, venous blood heparin- 
anticoagulated) stimulated with Aldrithiol-2 inactivated whole HIV-1 (20 hours, 20 
/xg/mL). Plasma samples were then centrifuged (lOOOxg) and assayed in dupHcate for 
5 antigen-specific IFN using capture ELISA. B) Flow cytometric detection of HTV-l specific 
CD69+/CD8+ T cells. Freshly isolated PBMCs were stimulated with inactivated HIV-1 as 
above for 16 hours, washed and labelled with the antibodies. Cells were then analysed 
using a FACScalibur™ flow cytometer and data, analysed using Cell-Quest software. C) 
Flow cytometric detection of HIV-1 specific CD69+/CD4+ T cells carried out as in B). 

10 Figure 23 shows a diagram of a system used to carry out the instructions encoded 

by the storage medium of Figures 28 and 29. 

Figure 24 depicts a flow diagram showing an embodiment of a method for 
designing synthetic polynucleotide and synthetic polypeptides of the invention. 

Figure 25 shows an algorithm, which inter alia utilises the steps of the method 
1 5 shown in Figure 24. 

Figure 26 shows an example of applying the algorithm of Figure 25 to an input 
consensus polyprotein sequence of Hepatitis C la to execute the segmentation of the 
polyprotein sequence, the rearrangement of the segments, the linkage of the rearranged 
segments and the oulputting of synthetic polynucleotide and polypeptide sequences for the 
20 preparation of Savines for treating and/or preventing Hepatitis C infection. 

Figure 27 illustrates an example of applying the algorithm of Figure 25 to input 
consensus melanocyte differentiation antigens (gplOO, MART, TRP-1, Tyros, Trp-2, 
MCIR, MUCIF and MUCIR) and to consensus melanoma specific antigens (BAGE, 
GAGE-1, gpl00In4, MAGE-1, MAGE-3, FRAME, TRP2IN2, NYNSOla, NYNSOlb and 
25 LAGEl) to facilitate segmentation of those sequences, to rearrange the segments, to link 
the rearranged segments and to synthetic polynucleotide and polypeptide sequences for the 
preparation of Savines for treating and/or preventing melanoma. 

Figure 28 shows a cross section of a magnetic storage medium. 

Figure 29 shows a cross section of an optically readable data storage medium. 
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Figure 30 shows six HIV Savine cassette sequences (Al [SEQ ID NO: 393], A2 
[SEQ ID NO: 399], B1[SEQ ID NO: 395], B2 [SEQ ID NO: 401], CI [SEQ ID NO: 397] 
and C2 [SEQ ID NO: 403]). Al, Bl and CI can be joined together using, for example, 
convenient restriction enzyme sites provided at the ends of each cassette to construct an 
5 embodiment of a full length HIV Savine [SEQ ID NO: 405]. A2, B2 and C2 can also be 
joined together to provide another embodiment of a full length HIV Savine with 350 aa 
mutations common in major HIV clades. The cassettes A/B/C can be joined into single 
constructs using specific restriction enzyme sites incorporated after the start codon or 
before the stop codon in the cassettes 



wo 01/90197 PCT/AUOl/00622 

- 17- 

BRIEF DESCRIPTION OF THE SEQUENCES: SUMMARY TABLE 

TABLE A 



,,mQpENCEID s; 
■,,v ! : NUMBER. ..■ . 


^ ^ ' . SEQUENCE . ^ ^'y \ 


..LEJ<{GTHri 


SEQ ID NO: 1 


GAG consensus polypeptide 


499 aa 


SEQ ID NO: 2 


POL consensus polypeptide 


995 aa. 


SEQ ID NO: 3 


VIF consensus polypeptide 


192 aa 


SEQ ID NO: 4 


VPR consensus polypeptide 


96 aa 


SEQ ID NO: 5 


TAT consensus polypeptide 


102 aa 


SEQ ID NO: 6 


REV consensus polypeptide 


123 aa 


SEQ ID NO: 7 


VPU consensus polypeptide 


81 aa 


SEQ ID NO: 8 


ENV consensus polypeptide 


651 aa 


SEQ ID NO: 9 


NEF consensus polypeptide 


206 aa 


SEQ ID NO: 10 


GAG segment 1 


90nts 


SEQ ID NO: 1 1 


Polypeptide encoded by SEQ ID NO: 10 


30 aa 


SEQ ID NO: 12 


GAG segment 2 


90nts 


SEQ ID NO: 13 


Polypeptide encoded by SEQ ID NO: 12 


30 aa 


SEQ ID NO: 14 


GAG segment 3 


90nts 


SEQ ID NO: 15 


Polypeptide encoded by SEQ ID NO: 14 


30 aa 


SEQ ID NO: 16 


GAG segment 4 


90nts 


SEQ ID NO: 17 


Polypeptide encoded by SEQ ID NO: 16 


30 aa 


SEQ ID NO: 18 


GAG segment 5 


90nts 


SEQ ID NO: 19 


Polypeptide encoded by SEQ ID NO: 18 


30 aa 


SEQ ID NO: 20 


GAG segment 6 


90nts 


SEQ ID NO: 21 


Polypeptide encoded by SEQ ID NO: 20 


30 aa 


SEQ ID NO: 22 


GAG segment 7 


90nts 
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• SEQUENCEID 
NUMBER 


SEQUENCE ■ ■ ' ' 


• , LENGTH 


SEQIDNO: 23 


Polypeptide encoded by SEQ ID NO: 22 


30 aa 


SEQ ID NO: 24 


GAG segment 8 


90nts 


SEQ ID NO: 25 


Polypeptide encoded by SEQ ID NO: 24 


30 aa 


SEQIDNO: 26 


GAG segment 9 


90nts 


SEQ ID NO: 27 


Polypeptide encoded by SEQ ID NO: 26 


30 aa 


SEQ ID NO: 28 


GAG segment 10 


90nts 


SEQ ID NO: 29 


Polypeptide encoded by SEQ ID NO: 28 


30 aa 


SEQIDNO: 30 


GAG segment 1 1 


90nts 


SEQIDNO: 31 


Polypeptide encoded by SEQ ID NO: 30 


30 £ia 


SEQ ID NO: 32 


GAG segment 12 


90nts 


SEQIDNO: 33 


Polypeptide encoded by SEQ ID NO: 32 


30 aa 


SEQ ID NO: 34 


GAG segment 13 


90nts 


SEQ ID NO: 35 


Polypeptide encoded by SEQ ID NO: 34 


30 aa 


SEQ ID NO: 36 


GAG segment 14 


90 nts 


SEQ ID NO: 37 


Polypeptide encoded by SEQ ID NO: 36 


30 aa 


SEQ ID NO: 38 


GAG segment 15 


90 nts 


SEQ ID NO: 39 


Polypeptide encoded by SEQ ID NO: 38 


30 aa 


SEQ ID NO: 40 


GAG segment 16 


90 nts 


SEQ ID NO: 41 


Polypeptide encoded by SEQ ID NO: 40 


30 aa 


SEQ ID NO: 42 


GAG segment 17 


90 nts 


SEQ ID NO: 43 


Polypeptide encoded by SEQ ID NO: 42 


30 aa 


SEQ ID NO: 44 


GAG segment 18 


90 nts 


SEQ ID NO: 45 


Polypeptide encoded by SEQ ID NO: 44 


30 aa 


SEQIDNO: 46 


GAG segment 19 


90 nts 
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SEQUENCE ID 
. . NUMBER 


, ■ . ' SEQUENCE • , • ■ 


. LENGTH: 


SEQ ID NO: 47 


Polypeptide encoded by SEQ ID NO: 46 


30 aa 


SEQIDNO: 48 


GAG segmesnt 20 


90nts 


SEQ ID NO: 49 


Polypeptide encoded by SEQ ID NO: 48 


30 aa 


SEQ ID NO: 50 


GAG segment 21 


90nts 


SEQ ID NO: 51 


Polypeptide encoded by SEQ ID NO: 50 


30 aa 


SEQ ID NO: 52 


GAG segment 22 


90nts 


SEQ ID NO: 53 


Polypeptide encoded by SEQ ID NO: 52 


30 aa 


SEQ ID NO: 54 


GAG segment 23 


90nts 


SEQ ID NO: 55 


Polypeptide encoded by SEQ ID NO: 54 


30 aa 


SEQ ID NO: 56 


GAG segment 24 


90nts 


SEQ ID NO: 57 


Polypeptide encoded by SEQ ID NO: 56 


30 aa 


SEQ ID NO: 58 


GAG segment 25 


90nts 


SEQIDNO: 59 


Polypeptide encoded by SEQ ID NO: 58 


30 aa 


SEQ ID NO: 60 


GAG segment 26 


90nts 


SEQ ID NO: 61 


Polypeptide encoded by SEQ ID NO: 60 


30 aa 


SEQ ID NO: 62 


GAG segment 27 


90nts 


SEQ ID NO: 63 


Polypeptide encoded by SBQ ID NO: 62 


30 aa 


SEQ ID NO: 64 


GAG segment 28 


90nts 


SEQ ID NO: 65 


Polypeptide encoded by SEQ ID NO: 64 


30 aa 


SEQ ID NO: 66 


GAG segment 29 


90nts 


SEQ ID NO: 67 


Polypeptide encoded by SEQ ID NO: 66 


30 aa 


SEQ ID NO: 68 


GAG segment 30 


90nts 


SEQ ID NO: 69 


Polypeptide encoded by SEQ ID NO: 68 


30 aa 


SEQ ID NO: 70 


GAG segment 31 


90nts 
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SEQUENCEID 
DUMBER 


SEQUENCE 


. LENGTH 


SEQIDNO:71 


Polypeptide encoded by SEQ ID NO: 70 


30 aa 


SEQ ID NO: 72 


GAG segment 32 


90nts 


SEQ ID NO: 73 


Polypeptide encoded by SEQ ID NO: 72 


30 aa 


SEQ ID NO: 74 


GAG segment 33 


57nts 


SEQ ID NO: 75 


Polypeptide encoded by SEQ ID NO: 74 


19 aa 


SEQ ID NO: 76 


POL segment 1 


90nts 


SEQ ID NO: 77 


Polypeptide encoded by SEQ ID NO: 76 


30 aa 


SEQ ID NO: 78 


POL segment 2 


90nts 


SEQ ID NO: 79 


Polypeptide encoded by SEQ ID NO: 78 


30 aa 


SEQ ID NO: 80 


POL segment 3 


90nts 


SEQ ID NO: 81 


Polypeptide encoded by SEQ ID NO; 80 


30 aa 


SEQ ID NO: 82 


POL segment 4 


90nts 


SEQ ID NO: 83 


Polypeptide encoded by SEQ ID NO: 82 


30 aa 


SEQ ID NO: 84 


POL segment 5 


90nts 


SEQ ID NO: 85 


Polypeptide encoded by SEQ ID NO: 84 


30 aa 


SEQ ID NO: 86 


POL segment 6 


90nts 


SEQ ID NO: 87 


Polypeptide encoded by SEQ ID NO: 86 


30 aa 


SEQ ID NO: 88 


POL segment 7 


90nts 


SEQ ID NO: 89 


Polypeptide encoded by SEQ ID NO: 88 


30 aa 


SEQ ID NO: 90 


POL segment 8 


90nts 


SEQ ID NO: 91 


Polypeptide encoded by SEQ ID NO: 90 


30 aa 


SEQ ID NO: 92 


POL segment 9 


90nts 


SEQ ID NO: 93 


Polypeptide encoded by SEQ ID NO: 92 


30 aa 


SEQ ID NO: 94 


POL segment 10 


90nts 
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] SEQUENCE ID 
NUMBER , 




ZENGTII :': 


SEQ ID NO: 95 


Polypeptide encoded by SEQ ID NO: 94 


30 aa 


SEQ m NO: 96 


POL segment 11 


90nts 


SEQ ID NO: 97 


Polypeptide encoded by SEQ ID NO: 96 


30 aa 


SEQ ID NO: 98 


POL segment 12 


90nts 


SEQ ID NO: 99 


Polypeptide encoded by SEQ ID NO: 98 


30 aa 


SEQ ID NO: 100 


POL segment 13 


90nts 


SEQ ID NO: 101 


Polypeptide encoded by SEQ ID NO: 100 


30 aa 


SEQ ID NO: 102 


POL segment 14 


90nts 


SEQ ID NO: 103 


Polypeptide encoded by SEQ ID NO: 102 


30 aa 


SEQ ID NO: 104 


POL segment 15 


90nts 


SEQ ID NO: 105 


Polypeptide encoded by SEQ ID NO: 104 


30 aa 


SEQ ID NO: 106 


POL segment 16 


90nts 


SEQ ID NO: 107 


Polypeptide encoded by SEQ ID NO: 106 


30 aa 


SEQ ID NO: 108 


POL segment 17 


90nts 


SEQ ID NO: 109 


Polypeptide encoded by SEQ ID NO: 108 


30 aa 


SEQ ID NO: 110 


POL segment 18 


90nts 


SEQ ID NO: 111 


Polypeptide encoded by SEQ ID NO: 110 


30 aa 


SEQ ID NO: 112 


POL segment 19 


90nts 


SEQ ID NO: 113 


Polypeptide encoded by SBQ ID NO: 1 12 


30 aa 


SEQ ID NO: 114 


POL segment 20 


90nts 


SEQ ID NO: 115 


Polypeptide encoded by SEQ ID NO: 1 14 


30 aa 


SEQ ID NO: 116 


POL segment 21 


90nts 


SEQ ID NO: 117 


Polypeptide encoded by SEQ ID NO: 116 


30 aa 


SEQ ID NO: 118 


POL segment 22 


90nts 
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SEQUENCE ID 
NUMBER 


"SEQUENCE . . . ■ ■. 

■ 


LENGTH 

• ■ ■ , 


SEQIDNO: 119 


Polypeptide encoded by SEQ ID NO: 118 


30 aa 


SEQIDNO: 120 


POL segment 23 


90nts 


SEQ ID NO: 121 


Polypeptide encoded by SEQ ID NO: 120 


30 aa 


SEQ ID NO: 122 


POL segment 24 


90nts 


SEQ ID NO: 123 


Polypeptide encoded by SEQ ID NO: 122 


30 aa 


SEQ ID NO: 124 


POL segment 25 


90nts 


SEQ ID NO: 125 


Polypeptide encoded by SEQ ID NO: 124 


30 aa 


SEQ ID NO: 126 


POL segment 26 


90nts 


SEQ ID NO: 127 


Polypeptide encoded by SEQ ID NO: 126 


30 aa 


SEQ ID NO: 128 


POL segment 27 


90nts 


SEQ ID NO: 129 


Polypeptide encoded by SEQ ID NO: 128 


30 aa 


SEQ ID NO: 130 


POL segment 28 


90 nts 


SEQIDNO: 131 


Polypeptide encoded by SEQ ID NO: 130 


30 aa 


SEQIDNO: 132 


POL segment 29 


90 nts 


SEQIDNO: 133 


Polypeptide encoded by SEQ ID NO: 132 


30 aa 


SEQ ID NO: 134 


POL segment 30 


90 nts 


SEQ ID NO: 135 


Polypeptide encoded by SEQ ID NO: 134 


30 aa 


SEQ ID NO: 136 


POL segment 31 


90 nts 


SEQIDNO: 137 


Polypeptide encoded by SEQ ID NO: 136 


30 aa 


SEQ ID NO: 138 


POL segment 32 


90 nts 


SEQ ID NO: 139 


Polypeptide encoded by SEQ ID NO: 138 


30 aa 


SEQ ID NO: 140 


POL segment 33 


90 nts 


SEQ ID NO: 141 


Polypeptide encoded by SEQ ID NO: 140 


30 aa 


SEQ ID NO: 142 


POL segment 34 


90 nts 
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SEQUENCE ID 
. NUMBER 


' .' ■ '. SEQUENCE _ , .,; 


LENGTH 


SEQ ID NO: 143 


Polypeptide encoded by SEQ ID NO: 142 


30 aa 


SEQ ID NO: 144 


POL segment 35 


90nts 


SEQ ID NO: 145 


Polypeptide encoded by SEQ ID NO: 144 


30 aa 


SEQ ID NO: 146 


POL segment 36 


90nts 


SEQ ID NO: 147 


Polypeptide encoded by SEQ ID NO: 146 


30 aa 


SEQ ID NO: 148 


POL segment 37 


90nts 


SEQ ID NO: 149 


Polypeptide encoded by SEQ ID NO: 148 


30 aa 


SEQ ID NO: 150 


POL segment 38 


90nts 


SEQ ID NO: 151 


Polypeptide encoded by SEQ ID NO: 150 


30 aa 


SEQ ID NO: 152 


POL segment 39 


90nts 


SEQ ID NO: 153 


Polypeptide encoded by SEQ ID NO: 152 


30 aa 


SEQ ID NO: 154 


POL segment 40 


90nts 


SEQ ID NO: 155 


Polypeptide encoded by SEQ ID NO: 154 


30 aa 


SEQ ID NO: 156 


POL segment 41 


90nts 


SEQ ID NO: 157 


Polypeptide encoded by SEQ ID NO: 156 


30 aa 


SEQ ID NO: 158 


POL segment 42 


90nts 


SEQ ID NO: 159 


Polypeptide encoded by SEQ ID NO: 158 


30 aa 


SEQ ID NO: 160 


POL segment 43 


90nts 


SEQ ID NO: 161 


Polypeptide encoded by SEQ ID NO: 160 


30 aa 


SEQ ID NO: 162 


POL segment 44 


90nts 


SEQ ID NO: 163 


Polypeptide encoded by SEQ ID NO: 162 


30 aa 


SEQ ID NO: 164 


POL segment 45 


90nts 


SEQ ID NO: 165 


Polypeptide encoded by SEQ ID NO: 164 


30 aa 


SEQ ID NO: 166 


POL segment 46 


90nts 
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SEQUENCE ID. 
, NUMBER 


SEQUENCE , ■ • - . . ' ■ . ■ 




SEQ ID NO; 167 


Polypeptide encoded by SEQ ID NO: 166 


30 aa 


SEQIDNO: 168 


POL segment 47 


90nts 


SEQ ID NO: 169 


Polypeptide encoded by SEQ ID NO: 168 


30 aa 


SEQIDNO: 170 


POL segment 48 


90nts 


SEQ ID NO: 171 


Polypeptide encoded by SEQ ID NO: 170 


30 aa 


SEQIDNO: 172 


POL segment 49 


90nts 


SEQIDNO: 173 


Polypeptide encoded by SEQ ID NO: 172 


30 aa 


SEQ ID NO: 174 


POL segment 50 


90nts 


SEQ ID NO: 175 


Polypeptide encoded by SEQ ID NO: 174 


30 aa 


SEQ ID NO: 176 


POL segment 51 


90 nts 


SEQ ID NO: 177 


Polypeptide encoded by SEQ ID NO: 176 


30 aa 


SEQIDNO: 178 


POL segment 52 


90 nts 


SEQIDNO: 179 


Polypeptide encoded by SEQ ID NO: 178 


30 aa 


SEQIDNO: 180 


POL segment 53 


90 nts 


SEQIDNO: 181 


Polypeptide encoded by SEQ ID NO: 180 


30 aa 


SEQIDNO: 182 


POL segment 54 


90 nts 


SEQIDNO: 183 


Polypeptide encoded by SEQ ID NO: 182 


30 aa 


SEQIDNO: 184 


POL segment 55 


90 nts 


SEQIDNO: 185 


Polypeptide encoded by SEQ ID NO: 184 


30 aa 


SEQIDNO: 186 


POL segment 56 


90 nts 


SEQ ID NO: 187 


Polypeptide encoded by SEQ ID NO: 186 


30 aa 


SEQIDNO: 188 


POL segment 57 


90 nts 


SEQIDNO: 189 


Polypeptide encoded by SEQ ID NO: 188 


30 aa 


SEQ ID NO: 190 


POL segment 58 


90 nts 
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' SEQUENCE ID 
NUMBER 


SEQUENCE 


LENGTH^ 


SEQ ID NO: 191 


Polypeptide encoded by SEQ ID NO: 190 


30 aa 


SEQ ID NO: 192 


POL segment 59 


90nts 


SEQ ID NO: 193 


Polypeptide encoded by SEQ ID NO: 192 


30 aa 


SEQ ID NO: 194 


POL segment 60 


90nts 


SEQ ID NO: 195 


Polypeptide encoded by SEQ ID NO: 194 


30 aa 


SEQ ID NO: 196 


POL segment 61 


90nts 


SEQ ID NO: 197 


Polypeptide encoded by SEQ ID NO: 196 


30 aa ^ 


SEQ ID NO: 198 


POL segment 62 


90nts 


SEQ ID NO: 199 


Polypeptide encoded by SEQ ID NO: 198 


30 aa 


SEQ ID NO: 200 


POL segment 63 


90nts 


SEQ ID NO: 201 


Polypeptide encoded by SEQ ID NO: 200 


30 aa 


SEQ ID NO: 202 


POL segment 64 


90nts 


SEQ ID NO: 203 


Polypeptide encoded by SEQ ID NO: 202 


30 aa 


SEQ ID NO: 204 


POL segment 65 


90nts 


SEQ ID NO: 205 


Polypeptide encoded by SEQ ID NO: 204 


30 aa 


SEQ ID NO: 206 


POL segment 66 


60nts 


SEQ ID NO: 207 


Polypeptide encoded by SEQ ID NO: 206 


20 aa 


SEQ ID NO: 208 


VIF segment 1 


90nts 


SEQ ID NO: 209 


Polypeptide encoded by SEQ ID NO: 208 


30 aa 


SEQ ID NO: 210 


VTF segment 2 


90nts 


SEQ ID NO: 211 


Polypeptide encoded by SEQ ID NO: 210 


30 aa 


SEQ ID NO: 212 


VIF segment 3 


90nts 


SEQ ID NO: 213 


Polypeptide encoded by SEQ ID NO: 212 


30 aa 


SEQ ID NO: 214 


VIF segment 4 


90nts 
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SEQUENCE ID 

■ NumkR • 


SEQUENCE 


LENGTH ' 

{ , • J 


SEQ ID NO: 215 


Polypeptide encoded by SEQ ID NO: 214 


30 aa 


SEQIDNO: 216 


VIF segment 5 


90nts 


SEQ ID NO: 217 


Polypeptide encoded by SEQ ID NO: 216 


30 aa 


SEQ ID NO: 218 


VIF segment 6 


90nts 


SEQIDNO: 219 


Polypeptide encoded by SEQ ID NO: 218 


30 aa 


SEQ ID NO: 220 


VIF segment 7 


90nts 


SEQ ID NO: 221 


Polypeptide encoded by SEQ ID NO: 220 


30 aa 


SEQ ID NO: 222 


VIF segment 8 


90nts 


SEQ ID NO: 223 


Polypeptide encoded by SEQ ID NO: 222 


30 aa 


SEQ ID NO: 224 


VIF segment 9 


90nts 


SEQ ID NO: 225 


Polypeptide encoded by SEQ ID NO: 224 


30 aa 


SEQ ID NO: 226 


VIF segment 10 


90nts 


SEQ ID NO: 227 


Polypeptide encoded by SEQ ID NO: 226 


30 aa 


SEQ ID NO: 228 


VIF segment 1 1 


90nts 


SEQ ID NO: 229 


Polypeptide encoded by SEQ ID NO: 228 


30 aa 


SEQ ID NO: 230 


VIF segment 12 


81 nts 


SEQIDNO: 231 


Polypeptide encoded by SEQ ID NO: 230 


27 aa 


SEQ ID NO: 232 


VPR segment 1 


90 nts 


SEQ ID NO: 233 


Polypeptide encoded by SEQ ID NO: 232 


30 aa 


SEQ ID NO: 234 


VPR segment 2 


90 nts 


SEQ ID NO: 235 


Polypeptide encoded by SEQ ID NO: 234 


30 aa 


SEQ ID NO: 236 


VPR segment 3 


90 nts 


SEQ ID NO: 237 


Polypeptide encoded by SEQ ID NO: 236 


30 aa 


SEQ ID NO: 238 


VPR segment 4 


90 nts 
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SEQUENCE ID 
NUMBER 


• , SEQUENCE 


LENGTH 

. " 


SEQ ID NO: 239 


Polypeptide encoded by SEQ ID NO: 238 


30 aa 


SEQ ID NO: 240 


VPR segment 5 


90nts 


SEQ ID NO: 241 


Polypeptide encoded by SEQ ID NO: 240 


30 aa 


SEQ ID NO: 242 


VPR segment 6 


63 nts 


SEQ ID NO: 243 


Polypeptide encoded by SEQ ID NO: 242 


21 aa 


SEQ ID NO: 244 


TAT segment 1 


90 nts 


SEQ ID NO: 245 


Polypeptide encoded by SEQ ID NO: 244 


30 aa 


SEQ ID NO: 246 


TAT segment 2 


90 nts 


SEQ ID NO: 247 


Polypeptide encoded by SEQ ID NO: 246 


30 aa 


SEQ ID NO: 248 


TAT segment 3 


90 nts 


SEQ ID NO: 249 


Polypeptide encoded by SEQ ID NO: 248 


30 aa 


SEQ ID NO: 250 


TAT segment 4 


90 nts 


SEQ ID NO: 251 


Polypeptide encoded by SEQ ID NO: 250 


30 aa 


SEQ ID NO: 252 


TAT segment 5 


90 nts 


SEQ ID NO: 253 


Polypeptide encoded by SEQ ID NO: 252 


30 aa 


SEQ ID NO: 254 


TAT segment 6 


81 nts 


SEQ ID NO: 255 


Polypeptide encoded by SEQ ID NO: 254 


27 aa 


SEQ ID NO: 256 


REV segment 1 


90 nts 


SEQ ID NO: 257 


Polypeptide encoded by SEQ ID NO: 256 


30 aa 


SEQ ID NO: 258 


REV segment 2 


90 nts 


SEQ ID NO: 259 


Polypeptide encoded by SEQ ID NO: 258 


30 aa 


SEQ ID NO: 260 


REV segment 3 


90 nts 


SEQ ID NO: 261 


Polypeptide encoded by SEQ ID NO: 260 


30 aa 


SEQ ID NO: 262 


REV segment 4 


90 nts 
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' SEQUENGEJD 
. NUMBER 


■ SEQUENCE . ,' }■ 

^ - ■ • ■ . ' 


LENGTH ' 


SEQ ID NO: 263 


Polypeptide encoded by SEQ ID NO: 262 


30 aa 


SEQ ID NO: 264 


REV segment 5 


90nts 


SEQ ID NO: 265 


Polypeptide encoded by SEQ ID NO: 264 


30 aa 


SEQ ID NO: 266 


REV segment 6 


90nts 


SEQ ID NO: 267 


Pol3^eptide encoded by SEQ ED NO: 266 


30 aa 


SEQ ID NO: 268 


REV segment 7 


90nts 


SEQ ID NO: 269 


Polypeptide encoded by SEQ ID NO: 268 


30 aa 


SEQ ID NO: 270 


REV segment 8 


54nts 


SEQ ID NO: 271 


Polypeptide encoded by SEQ ID NO: 270 


18 aa 


SEQ ID NO: 272 


VPU segment 1 


90nts 


SEQ ID NO: 273 


Polypeptide encoded by SEQ ID NO: 272 


30 aa 


SEQ ID NO: 274 


VPU segment 2 


90nts 


SEQ ID NO: 275 


Polypeptide encoded by SEQ ID NO: 274 


30 aa 


SEQ ID NO: 276 


VPU segment 3 


90nts 


SEQ ]D NO: 277 


Polypeptide encoded by SEQ ID NO: 276 


30 aa 


SEQ ID NO: 278 


VPU segment 4 


90nts 


SEQ ID NO: 279 


Polypeptide encoded by SEQ ID NO: 278 


30 aa 


SEQ ID NO: 280 


VPU segment 5 


63 nts 


SEQ ID NO: 281 


Polypeptide encoded by SEQ ID NO: 280 


21 aa 


SEQ ID NO: 282 


ENV segment 1 


90 nts 


SEQ ID NO: 283 


Polypeptide encoded by SEQ ID NO: 282 


30 aa 


SEQ ID NO: 284 


ENV segment 2 


90 nts 


SEQ ID NO: 285 


Polypeptide encoded by SEQ ID NO: 284 


30 aa 


SEQ ID NO: 286 


ENV segment 3 


90 nts 
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SEQUENCE ID 


SEQUENCE . ' • • , . 


LENGTH 


NUMBER ■ 




SEQ ID NO: 287 


Polj^eptide encoded by SEQ ID NO: 286 


30 aa 


SEQ ID NO: 288 


ENV segment 4 


90nts 


SEQ ID NO: 289 


Polypeptide encoded by SEQ ID NO: 288 


30 aa 


SEQ ID NO: 290 


ENV segment 5 


90nts 


SEQ ID NO: 291 


Polypeptide encoded by SEQ ID NO: 290 


30 aa 


SEQ ID NO: 292 


ENV segment 6 


90nts 


SEQ ID NO: 293 


Polypeptide encoded by SEQ ID NO: 292 


30 aa 


SEQ ID NO: 294 


ENV segment 7 


90nts 


SEQ ID NO: 295 


Polypeptide encoded by SEQ ID NO: 294 


30 aa 


SEQ ID NO: 296 


ENV segment 8 


90nts 


SEQ ID NO: 297 


Polypeptide encoded by SEQ ID NO: 296 


30 aa 


SEQ ID NO: 298 


ENV segment 9 


57nts 


SEQ ID NO: 299 


Polypeptide encoded by SEQ ID NO: 298 


19 aa 


SEQ ID NO: 300 


GAP A segment 1 


90nts 


SEQ ID NO: 301 


Polypeptide encoded by SEQ ID NO: 300 


30 aa 


SEQ ID NO: 302 


GAP A segment 2 


90nts 


SEQ ID NO: 303 


Polypeptide encoded by SEQ ID NO: 302 


30 aa 


SEQ ID NO: 304 


GAP A segment 3 


90nts 


SEQ ID NO: 305 


Polypeptide encoded by SEQ ID NO: 304 


30 aa 


SEQ ID NO: 306 


GAP A segment 4 


90nts 


SEQ ID NO: 307 


Polypeptide encoded by SEQ ID NO: 306 


30 aa 


SEQ ID NO: 308 


GAP A segment 5 


90nts 


SEQ ID NO: 309 


Polypeptide encoded by SEQ ID NO: 308 


30 aa 


SEQ ID NO: 310 


GAP A segment 6 


90nts 
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sequence id 
Mjmber 


.SEQUENCE- ■ - • , • 


LENGTH 


SEQIDNO:311 


Polypeptide encoded by SEQ ID NO: 310 


30 aa 


SEQIDNO:312 


GAP A segment 7 


75nts 


SEQIDNO: 313 


Polypeptide encoded by SEQ ID NO: 312 


25 nts 


SEQIDNO:314 


GAP B segment 1 


90nts 


SEQIDNO: 315 


Polypeptide encoded by SEQ ID NO: 314 


30 aa 


SEQIDNO: 316 


GAP B segment 2 


90 nts 


SEQIDNO: 317 


Polypeptide encoded by SEQ ID NO: 316 


30 aa 


SEQIDNO: 318 


GAP B segment 3 


90 nts 


SEQIDNO: 319 


Polypeptide encoded by SEQ ID NO: 318 


30 aa 


SEQ ID NO: 320 


GAP B segment 4 


90 nts 


SEQIDNO: 321 


Polypeptide encoded by SEQ ID NO: 320 


30 aa 


SEQ ID NO: 322 


GAP B segment 5 


90 nts 


SEQ ID NO: 323 


Polypeptide encoded by SEQ ID NO: 322 


30 aa 


SEQIDNO: 324 


GAP B segment 6 


90 nts 


SEQ ID NO: 325 


Polypeptide encoded by SEQ ID NO: 324 


30 aa 


SEQ ID NO: 326 


GAP B segment 7 


90 nts 


SEQIDNO: 327 


Polypeptide encoded by SEQ ID NO: 326 


30 aa 


SEQ ID NO: 328 


GAP B segment 8 


90 nts 


SEQ ID NO: 329 


Polypeptide encoded by SEQ ID NO: 328 


30 aa 


SEQ ID NO: 330 


GAP B segment 9 


90 nts 


SEQIDNO: 331 


Polypeptide encoded by SEQ ID NO: 330 


30 aa 


SEQIDNO: 332 


GAP B segment 10 


90 nts 


SEQ ID NO: 333 


Polypeptide encoded by SEQ ID NO: 332 


30 aa 


SEQ ID NO: 334 


GAP B segment 11 


?0nts 
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SEQUENCE ID 
■ r NUMBER 


SEQUENCE 


LENGTH 


SEQ ID NO: 335 


Polypeptide encoded by SEQ ID NO: 334 


30 aa 


SEQ ID NO: 336 


GAP B segment 12 


90 nts 


SEQ ID NO: 337 


Polypeptide encoded by SEQ ID NO: 336 


30 aa 


SEQ ID NO: 338 


GAP B segment 13 


90 nts 


SEQ ID NO: 339 


Polypeptide encoded by SEQ ID NO: 338 


30 aa 


SEQ ID NO: 340 


GAP B segment 14 


90 nts 


SEQ ID NO: 341 


Polypeptide encoded by SEQ ID NO: 340 


30 aa 


SEQ ID NO: 342 


GAP B segment 15 


90 nts 


SEQ ID NO: 343 


Polypeptide encoded by SEQ ID NO: 342 


30 aa 


SEQ ID NO: 344 


GAP B segment 16 


90 nts 


SEQ ID NO: 345 


Polypeptide encoded by SEQ ID NO: 344 


30 aa 


SEQ ID NO: 346 


GAP B segment 17 


90 nts 


SEQ ID NO: 347 


Polypeptide encoded by SEQ ID NO: 346 


30 aa 


SEQ ID NO: 348 


GAP B segment 18 


90 nts 


SEQ ID NO: 349 


Polypeptide encoded by SEQ ID NO: 348 


30 aa 


SEQ ID NO: 350 


GAP B segment 19 


90 nts 


SEQ ID NO: 351 


Polypeptide encoded by SEQ ID NO: 350 


30 aa 


SEQ ID NO: 352 


GAP B segment 20 


90 nts 


SEQ ID NO: 353 


Polypeptide encoded by SEQ ID NO: 352 


30 aa 


SEQ ID NO: 354 


GAP B segment 21 


90 nts 


SEQ ID NO: 355 


Polypeptide encoded by SEQ ID NO: 354 


30 aa 


SEQ ID NO: 356 


GAP B segment 22 


90 nts 


SEQ ID NO: 357 


Polypeptide encoded by SEQ ID NO: 356 


30 aa 


SEQ ED NO: 358 


GAP B segment 23 


90 nts 
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SEQUENCE ID 


, , SEQUENCE . ■ • ^ 


LENGTH- 


' 'NUMBER 






SEQ ED NO: 359 


Polypeptide encoded by SEQ ID NO: 358 


30 aa 


SEQE>NO: 360 


GAP B segment 24 


90nts 


SEQ ID NO: 361 


Polypeptide encoded by SEQ ID NO: 360 


30 aa 


SEQ ID NO: 362 


GAP B segment 25 


90nts 


SEQ ID NO: 363 


Polypeptide encoded by SEQ ID NO: 362 


30 aa 


SEQ ID NO: 364 


GAP B segment 26 


66 nts 


SEQ ID NO: 365 


Polypeptide encoded by SEQ ID NO: 364 


22 aa 


SEQ ID NO: 366 


NEF segment 1 


90 nts 


SEQ ID NO: 367 


Polypeptide encoded by SEQ ID NO: 366 


30 aa 


SEQ ID NO: 368 


NEF segment 2 


90 nts 


SEQ ID NO: 369 


Polypeptide encoded by SEQ ID NO: 368 


30 aa 


SEQ ID NO: 370 


NEF segment 3 


90 nts 


SEQ ID NO: 371 


Polypeptide encoded by SEQ ID NO: 370 


30 aa 


SEQ ID NO: 372 


NEF segment 4 


90 nts 


SEQ ID NO: 373 


Polypeptide encoded by SEQ ID NO: 372 


30 aa 


SEQ ID NO: 374 


NEF segment 5 


90 nts 


SEQ ID NO: 375 


Polypeptide aicoded by SEQ ID NO: 374 


30 aa 


SEQ ID NO: 376 


NEF segment 6 


90 nts 


SEQ ID NO: 377 


Polypeptide encoded by SEQ ID NO: 376 


30 aa 


SEQ ID NO: 378 


NEF segment 7 


90 nts 


SEQ ID NO: 379 


Polypeptide encoded by SEQ ID NO: 378 


30 aa 


SEQ ID NO: 380 


NEF segment 8 


90 nts 


SEQ ID NO: 381 


Polypeptide encoded by SEQ ID NO: 380 


30 aa 


SEQ ID NO: 382 


NEF segment 9 


90 nts 
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SEQUENCE ID 
NUMBER 


■ SEQUENCE 


LENGTH 


SEQ ID NO: 383 


Polypeptide encoded by SEQ ID NO: 382 


30 aa 


SEQ ID NO: 384 


NEF segment 10 


90nts 


SEQ ID NO: 385 


Polypeptide encoded by SEQ ID NO: 384 


30 aa 


SEQ ID NO: 386 


NEF segment 11 


90nts 


SEQ ID NO: 387 


Polypeptide encoded by SEQ ID NO: 386 


30 aa 


SEQ ID NO: 388 


NEF segment 12 


90nts 


SEQ ID NO: 389 


Polypeptide encoded by SEQ ID NO: 388 


30 aa 


SEQ ID NO: 390 


NEF segment 13 


78nts 


SEQ ID NO: 391 


Polypeptide encoded by SEQ ID NO: 390 


26 aa 


SEQ ID NO: 392 


HIV Cassette Al 


5703 nts 


SEQ ID NO: 393 


Polypeptide encoded by SEQ ID NO:392 


1896 aa 


SEQ ID NO: 394 


HIV Cassette Bl 


5685 nts 


SEQ ID NO: 395 


Polypeptide encoded by SEQ ID NO: 394 


1890 aa 


SEQ ID NO: 396 


mv Cassette CI 


5925 nts 


SEQ ID NO: 397 


Polypeptide encoded by SEQ ID NO: 396 


1967 aa 


SEQ ID NO: 398 


HIV Cassette A2 


5703 nts 


SEQ ID NO: 399 


Polypeptide encoded by SEQ ID NO: 398 


1896 aa 


SEQ ID NO: 400 


HIV Cassette B2 


5685 nts 


SEQ ID NO: 401 


Polypeptide encoded by SEQ ID NO: 400 


1890 aa 


SEQ ID NO: 402 


HIV Cassette C2 


5925 nts 


SEQ ID NO: 403 


Polypeptide encoded by SEQ ID NO: 402 


1967 aa 


SEQ ID NO: 404 


HIV complete Savine 


17244 nts 


SEQ ID NO: 405 


Polypeptide encoded by SEQ ID NO: 404 


5747 aa 


SEQ ID NO: 406 


HepCla consensus polyprotein sequence 


3011 aa 
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SEQUENCER) 
■ NUMBER 


, , SEQUENCE ■ ■' ' 


LENGTH- 


SEQ ID NO: 407 


HepCla segment 1 


90nts 


SEQ ID NO: 408 


Polypeptide encoded by SEQ ID NO: 407 


30 aa 


SEQ ID NO: 409 


HepCla segment 2 


90nts 


SEQ ID NO: 410 


Polypeptide encoded by SEQ ID NO: 409 


30 aa 


SEQ ID NO: 411 


HepCla segment 3 


90nts 


SEQ ID NO: 412 


Polypeptide encoded by SEQ ID NO: 41 1 


30 aa 


SEQ ID NO: 413 


HepCla segment 4 


90nts 


SEQ ID NO: 414 


Polypeptide encoded by SEQ ID NO: 413 


30 aa 


SEQ ID NO: 415 


HepCla segment 5 


90nts 


SEQ ID NO: 416 


Polypeptide encoded by SEQ ID NO: 415 


30 aa 


SEQ ID NO: 417 


HepCla segment 6 


90nts 


SEQ ID NO: 418 


Polypeptide encoded by SEQ ID NO: 417 


30 aa 


SEQ ID NO: 419 


HepCla segment 7 


90nts 


SEQ ID NO: 420 


Polypeptide encoded by SEQ ID NO: 419 


30 aa 


SEQ ID NO: 421 


HepCla segment 8 


90nts 


SEQ ID NO: 422 


Polypeptide encoded by SEQ ID NO: 421 


30 aa 


SEQ ID NO: 423 


HepCla segment 9 


90nts 


SEQ ID NO: 424 


Polypeptide encoded by SEQ ID NO: 423 


30 aa 


SEQ ID NO: 425 


HepCla segment 10 


90 nts 


SEQ ID NO: 426 


Polypeptide encoded by SEQ ID NO: 425 


30 aa 


SEQ ID NO: 427 


HepCla segment 1 1 


90 nts 


SEQ ID NO: 428 


Polypeptide encoded by SEQ ID NO: 427 


30 aa 


SEQ ID NO: 429 


HepCla segment 12 


90 nts 


SEQ ID NO: 430 


Polypq)tide encoded by SEQ ID NO: 429 


30 aa 



wo 01/90197 



PCT/AUOl/00622 



-35- 



SEQUENCE ID 
, • NUMBER 


• ; " ' ■ SEQUENCE 


' LENGTH ■ 


SEQIDNO: 431 


HepCla segment 13 


90nts 


SEQ BD NO: 432 


Polypeptide encoded by SEQ ID NO: 431 


30 aa 


SEQ ID NO: 433 


HepCla segment 14 


90nts 


SEQ ID NO: 434 


Polypeptide encoded by SEQ ID NO: 433 


30 aa 


SEQ ID NO: 435 


HepCla segment 15 


90nts 


SEQ ID NO: 436 


Polypeptide encoded by SEQ ID NO: 435 


30 aa 


SEQ ID NO: 437 


HepCla segment 16 


90nts 


SEQ ID NO: 438 


Polypeptide encoded by SEQ ID NO: 437 


30 aa 


SEQ ID NO: 439 


HepCla segment 17 


90nts 


SEQ ID NO: 440 


Polypeptide encoded by SEQ ID NO: 439 


30 aa 


SEQIDNO: 441 


HepCla segment 18 


90nts 


SEQ ID NO: 442 


Polypeptide encoded by SEQ ID NO: 441 


30 aa 


SEQ ID NO: 443 


HepCla segment 19 


90nts 


SEQ ID NO: 444 


Polypeptide encoded by SEQ ID NO: 443 


30 aa 


SEQ ID NO: 445 


HepCla segment 20 


90nts 


SEQ ID NO: 446 


Polypeptide encoded by SEQ ID NO: 445 


30 aa 


SEQ ID NO: 447 


HepCla segment 21 


90 Jits 


SEQ ID NO: 448 


Polypeptide encoded by SEQ ID NO: 447 


30 aa 


SEQ ID NO: 449 


HepC 1 a segment 22 


90nts 


SEQ ID NO: 450 


Polypeptide encoded by SEQ ID NO: 449 


30 aa 


SEQ ID NO: 451 


HepCla segment 23 


90nts 


SEQ ID NO: 452 


Polypeptide encoded by SEQ ID NO: 451 


30 aa 


SEQIDNO: 453 


HepC 1 a segment 24 


90nts 


SEQ ID NO: 454 


Polypeptide encoded by SEQ ID NO: 453 


30 aa 
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.SEQUENCE ID . 
" NUMBER ■■ 






SEQ ID NO: 455 


HepCla segment 25 


90nts 


SEQIDNO:456 


Polypeptide encoded by SEQ ID NO: 455 


30 aa 


SEQ ID NO: 457 


HepCla segment 26 


90nts 


SEQ ID NO: 458 


Polypeptide encoded by SEQ ID NO: 457 


30 aa 


SEQ ID NO: 459 


HepCla segment 27 


90nts 


SEQ ID NO: 460 


Polypeptide encoded by SEQ ID NO: 459 


30 aa 


SEQ ID NO: 461 


HepCla segment 28 


90nts 


SEQ ID NO: 462 


Polypeptide encoded by SEQ ID NO: 461 


30 aa 


SEQ ID NO: 463 


HepCla segment 29 


90nts 


SEQ ID NO: 464 


Polypeptide encoded by SEQ ID NO: 463 


30 aa 


SEQ ID NO: 465 


HepCla segment 30 


90nts 


SEQ ID NO: 466 


Polypeptide encoded by SEQ ID NO: 465 


30 aa 


SEQ ID NO: 467 


HepCla segment 31 


90nts 


SEQ ID NO: 468 


Polypeptide encoded by SEQ ID NO: 467 


30 aa 


SEQ ID NO: 469 


HepCla segment 32 


90nts 


SEQ ID NO: 470 


Polypeptide encoded by SEQ ID NO: 469 


30 aa 


SEQ ID NO: 471 


HepCl a segment 33 


90nts 


SEQ ID NO: 472 


Polypeptide encoded by SEQ ID NO: 471 


30 aa 


SEQ ID NO: 473 


HepCla segment 34 


90nts 


SEQ ID NO: 474 


Polypeptide encoded by SEQ ID NO: 473 


30 aa 


SEQ ID NO: 475 


HepCla segment 35 


90n.ts 


SEQ ID NO: 476 


Polypeptide encoded by SEQ ID NO: 475 


30 aa 


SEQ ID NO: 477 


HepCla segment 36 


90nts 


SEQ ID NO: 478 


Polypeptide encoded by SEQ ID NO: 477 


30 aa 
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SEQUENCER) 


.. ' SEQUENCE : ■ "'/ 


LENGTH 


.NUMBER 






SEQ ID NO: 479 


HepCla segment 37 


90n.ts 


SEQ ID NO: 480 


Polypeptide encoded by SEQ ID NO: 479 


30 aa 


SEQ ID NO: 481 


HepCla segment 38 


90nts 


SEQ ID NO: 482 


Polypeptide encoded by SEQ ID NO: 481 


30 aa 


SEQ ID NO: 483 


HepCla segment 39 


90nts 


SEQ ID NO: 484 


Polypeptide encoded by SEQ ID NO: 483 


30 aa 


SEQ ID NO: 485 


HepCla segment 40 


90nts 


SEQ ID NO: 486 


Polypeptide encoded by SEQ ID NO: 485 


30 aa 


SEQ ID NO: 487 


HepCla segment 41 


90nts 


SEQ ID NO: 488 


Polypeptide encoded by SEQ ID NO: 487 


30 aa 


SEQ ID NO: 489 


HepCla segment 42 


90nts 


SEQ ID NO: 490 


Polypeptide encoded by SEQ ID NO: 489 


30 aa 


SEQ ID NO: 491 


HepCla segment 43 


90nts 


SEQ ID NO: 492 


Polypeptide encoded by SEQ ID NO: 491 


30 aa 


SEQ ID NO: 493 


HepCla segment 44 


90nts 


SEQ ID NO: 494 


Polypeptide encoded by SEQ ID NO: 493 


30 aa 


SEQ ID NO: 495 


HepCla segment 45 


90nts 


SEQ ID NO: 496 


Polypeptide encoded by SEQ ID NO: 495 


30 aa 


SEQ ID NO: 497 


HepCla segment 46 


90nts 


SEQ ID NO: 498 


Polypeptide encoded by SEQ ID NO: 497 


30 aa 


SEQ ID NO: 499 


HepCla segment 47 


90nts 


SEQ ID NO: 500 


Polypeptide encoded by SEQ ID NO: 499 


30 aa 


SEQ ID NO: 501 


HepCla segment 48 


90nts 


SEQ ID NO: 502 


Polypeptide encoded by SEQ ID NO: 501 


30 aa 
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, SEQUENCEID . 


SEQUENCE ■, •'. ' " • ■' 


BEmTH . 


SEQ ID NO: 503 


HepCla segment 49 


90nts 


SEQ ID NO: 504 


Polypeptide encoded by SEQ ID NO: 503 


30 aa 


SEQ ID NO: 505 


HepCla segment 50 


90nts 


SEQ ID NO: 506 


Polypeptide encoded by SEQ ID NO: 505 


30 aa 


SEQ ID NO: 507 


HepCla segment 51 


90nts 


SEQ ID NO: 508 


Polypeptide encoded by SEQ ID NO: 507 


30 aa 


SEQ ID NO: 509 


HepCla segment 52 


90nts 


SEQ ID NO: 510 


Polypeptide encoded by SEQ ID NO: 509 


30 aa 


SEQ ID NO: 511 


HepCla segment 53 


90nts 


SEQ ID NO: 512 


Polypeptide encoded by SEQ ID NO: 511 


30 aa 


SEQ ID NO: 513 


HepCla segment 54 


90nts 


SEQ ID NO: 514 


Polypeptide encoded by SEQ ID NO: 513 


30 aa 


SEQ ID NO: 515 


HepCla segment 55 


90nts 


SEQ ID NO: 516 


Polypeptide encoded by SEQ ID NO: 515 


30 aa 


SEQ ID NO: 517 


HepCla segment 56 


90nts 


SEQ ID NO: 518 


Polypeptide encoded by SEQ ID NO: 517 


30 aa 


SEQ ID NO: 519 


HepCla segment 57 


90nts 


SEQ ID NO: 520 


Polypeptide encoded by SEQ ID NO: 519 


30 aa 


SEQ ID NO: 521 


HepCla segment 58 


90nts 


SEQ ID NO: 522 


Polypeptide encoded by SEQ ID NO: 521 


30 aa 


SEQ ID NO: 523 


HepCla segment 59 


90nts 


SEQ ID NO: 524 


Polypeptide encoded by SEQ ID NO: 523 


30 aa 


SEQ ID NO: 525 


HepCla segment 60 


90nts 


SEQ ID NO: 526 


Polypeptide encoded by SEQ ID NO: 525 


30 aa 
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SEQUENCE ID. 
, • NUMBER ■ 


. ' ■ ; ' SEQUENCE ■ ■' 


LENGTH 

i 


SEQ ID NO: 527 


HepCla segment 61 


90nts 


SEQ ID NO: 528 


Polypeptide encoded by SEQ ID NO: 527 


30 aa 


SEQ ID NO: 529 


HepCla segment 62 


90nts 


SEQ ID NO: 530 


Polypeptide encoded by SEQ ID NO: 529 


30 aa 


SEQ ID NO: 531 


HepCla segment 63 


90nts 


SEQ ID NO: 532 


Polypeptide encoded by SEQ ID NO: 53 1 


30 aa 


SEQ ID NO: 533 


HepCla segment 64 


90nts 


SEQ ID NO: 534 


Polypeptide encoded by SEQ ID NO: 533 


30 aa 


SEQ ID NO: 535 


HepCla segment 65 


90nts 


SEQ ID NO: 536 


Polypeptide encoded by SEQ ID NO: 535 


30 aa 


SEQ ID NO: 537 


HepCla segment 66 


90nts 


SEQ ID NO: 538 


Polypeptide encoded by SEQ ID NO: 537 


30 aa 


SEQ ID NO: 539 


HepCla segment 67 


90nts 


SEQ ID NO: 540 


Polypeptide encoded by SEQ ID NO: 539 


30 aa 


SEQ ID NO: 541 


HepCla segment 68 


90nts 


SEQ ID NO: 542 


Polypeptide encoded by SEQ ID NO: 541 


30 aa 


SEQ ID NO: 543 


HepC 1 a segment 69 


90nts 


SEQ ID NO: 544 


Polypeptide encoded by SEQ ID NO: 543 


30 aa 


SEQ ID NO: 545 


HepCla segment 70 


90nts 


SEQ ID NO: 546 


Polypeptide encoded by SEQ ID NO:545 


30 aa 


SEQ ID NO: 547 


HepCla segment 71 


90nts 


SEQ ID NO: 548 


Polypeptide encoded by SEQ ID NO: 547 


30 aa 


SEQ ID NO: 549 


HepCla segment 72 


90nts 


SEQ ID NO: 550 


Polypeptide encoded by SEQ ID NO: 549 


30 aa 
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SEQUENCEID 
NUMBER 


SEQUENCE ■ - 


■LENGTH , 

' I. . • ■• . .• 


SEQIDNO: 551 


HepCla segment 73 


90nts 


SEQIDNO: 552 


Polypeptide encoded by SEQ ID NO: 551 


30 aa 


SEQ ID NO: 553 


HepCla segment 74 


90nts 


SEQ ID NO: 554 


Polypeptide encoded by SEQ ID NO: 553 


30 aa 


SEQ ID NO: 555 


HepCla segment 75 


90nts 


SEQ ID NO: 556 


Polypeptide encoded by SEQ ID NO: 555 


30 aa 


SEQ ID NO: 557 


HepCla segment 76 


90nts 


SEQ ID NO: 558 


Polypeptide encoded by SEQ ID NO: 557 


30 aa 


SEQ ID NO: 559 


HepCla segment 77 


90nts 


SEQ ID NO: 560 


Polypeptide encoded by SEQ ID NO: 559 


30 aa 


SEQ ID NO: 561 


HepCla segment 78 


90nts 


SEQ ID NO: 562 


Polypeptide encoded by SEQ ID NO: 561 


30 aa 


SEQ ID NO: 563 


HepCla segment 79 


90nts 


SEQ ID NO: 564 


Polypeptide encoded by SEQ ID NO: 563 


30 aa 


SEQ ID NO: 565 


HepC 1 a segment 80 


90nts 


SEQ ID NO: 566 


Polypeptide encoded by SEQ ID NO: 565 


30 aa 


SEQIDNO: 567 


HepCla segment 81 


90nts 


SEQ ID NO: 568 


Polypeptide encoded by SEQ ED NO: 567 


30 aa 


SEQIDNO: 569 


HepCla segment 82 


90nts 


SEQ ID NO: 570 


Polypeptide encoded by SEQ ID NO: 569 


30 aa 


SEQIDNO: 571 


HepCla segment 83 


90nts 


SEQIDNO: 572 


Polypeptide encoded by SEQ ID NO: 571 


30 aa 


SEQ ID NO: 573 


HepCla segment 84 


90nts 


SEQ ID NO: 574 


Polypeptide encoded by SEQ ID NO: 573 


30 aa 
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SEQUENCE ID 
NUMBER , 


SEQUENCE 


LENGTH 


SEQ ID NO: 575 


HepCla segment 85 


90nts 


SEQ ID NO: 576 


Polypeptide encoded by SEQ ID NO: 575 


30 aa 


SEQ ID NO: 577 


HepCla segment 86 


90nts 


SEQ ID NO: 578 


Polypeptide encoded by SEQ ID NO: 577 


30 aa 


SEQ ID NO: 579 


HepCla segment 87 


90nts 


SEQ ID NO: 580 


Polypeptide encoded by SEQ ID NO: 579 


30 aa 


SEQ ID NO: 581 


HepCla segment 88 


90nts 


SEQ ID NO: 582 


Polypeptide encoded by SEQ ID NO: 581 


30 aa 


SEQ ID NO: 583 


HepCla segment 89 


90nts 


SEQ ID NO: 584 


Polypeptide encoded by SEQ ID NO: 583 


30 aa 


SEQ ID NO: 585 


HepCla segment 90 


90nts 


SEQ ID NO: 586 


Polypeptide encoded by SEQ ID NO: 585 


30 aa 


SEQ ID NO: 587 


HqpCla segment 91 


90nts 


SEQ ID NO: 588 


Polypeptide encoded by SEQ ID NO: 587 


30 aa 


SEQ ID NO: 589 


HepCla segment 92 


90nts 


SEQ ED NO: 590 


Polypeptide encoded by SEQ ID NO: 589 


30 aa 


SEQ ID NO: 591 


HepCla segment 93 


90nts 


SEQ ID NO: 592 


Polypeptide encoded by SEQ ID NO: 591 


30 aa 


SEQ ID NO: 593 


HepCla segmait 94 


90nts 


SEQ ID NO: 594 


Polypeptide encoded by SEQ ID NO: 593 


30 aa 


SEQ ID NO: 595 


HepCla segment 95 


90 nts 


SEQ ID NO: 596 


Polypeptide encoded by SEQ ID NO: 595 


30 aa 


SEQ ID NO: 597 


HepCla segment 96 


90 nts 


SEQ ID NO: 598 


Polypeptide encoded by SEQ ID NO: 597 


30 aa 
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SEQUENCEJD 
■ NUMBER > 


SEQUENCE 

• 


LENGTH 


SEQ ID NO: 599 


HepCla segment 97 


90nts 


SEQ ID NO: 600 


Polypeptide encoded by SEQ ID NO: 599 


30 aa 


SEQ ID NO: 601 


HepCla segment 98 


90nts 


SEQ ID NO: 602 


Polypeptide encoded by SEQ ID.NO: 601 


30 aa 


SEQ ID NO: 603 


HepCla segment 99 


90nts 


SEQ ID NO: 604 


Polypeptide encoded by SEQ ID NO: 603 


30 aa 


SEQ ID NO: 605 


HepCla segment 100 


90nts 


SEQ ID NO: 606 


Polypeptide encoded by SEQ ID NO: 605 


30 aa 


SEQ ID NO: 607 


HepCla segment 101 


90nts 


SEQ ID NO: 608 


Polypeptide encoded by SEQ ID NO: 607 


30 aa 


SEQ ID NO: 609 


HepCla segment 102 


90nts 


SEQ ID NO: 610 


Polypeptide encoded by SEQ ID NO: 609 


30 aa 


SEQ ID NO: 611 


HepCla segment 103 


90nts 


SEQ ID NO: 612 


Polypeptide encoded by SEQ ID NO: 61 1 


30 aa 


SEQ ID NO: 613 


HepCla segment 104 


90nts 


SEQ ID NO: 614 


Polypeptide encoded by SEQ ID NO: 613 


30 aa 


SEQ ID NO: 615 


HepCla segment 105 


90nts 


SEQ ID NO: 616 


Polypeptide encoded by SEQ ID NO: 615 


30 aa 


SEQ ID NO: 617 


HepCla segment 106 


90nts 


SEQIDKO:618 


Polypeptide encoded by SEQ ID NO: 617 


30 aa 


SEQ ID NO: 619 


HepCla segment 107 


90nts 


SEQ ID NO: 620 


Polypeptide encoded by SEQ ID NO: 619 


30 aa 


SEQ ID NO: 621 


HepCla segment 108 


90nts 


SEQ ID NO: 622 


Polypeptide encoded by SEQ ID NO: 621 


30 aa 
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SEQUENCEID 
NUMBER 


• SEQUENCE ■ ■' ■ 


LENGTH 


SEQ ID NO: 623 


HepC 1 a segment 1 09 


90nts 


SEQ JD NO: 624 


Polypeptide encoded by SEQ ID NO: 623 


30 aa 


SEQ ID NO: 625 


HepC la segment 110 


90nts 


SEQ ID NO: 626 


Polypeptide encoded by SEQ ID NO: 625 


30 aa 


SEQ ID NO: 627 


HepCl a segment 111 


90nts 


SEQ ID NO: 628 


Polypeptide encoded by SEQ ID NO: 627 


30 aa 


SEQ ID NO: 629 


HepCla segment 1 12 


90nts 


SEQ ID NO: 630 


Polypeptide encoded by SEQ ID NO: 629 


30 aa 


SEQ ID NO: 631 


HepC 1 a segment 113 


90nts 


SEQ ID NO: 632 


Polypeptide encoded by SEQ ED NO: 631 


30 aa 


SEQ ID NO: 633 


HepCla segment 1 14 


90nts 


SEQ ID NO: 634 


Polypeptide encoded by SEQ ID NO: 633 


30 aa 


SEQ ID NO: 635 


HepCla segment 115 


90nts 


SEQ ID NO: 636 


Polypeptide eaticoded by SEQ ID NO: 635 


30 aa 


SEQ ID NO: 637 


HepCla segment 116 


90nts 


SEQ ID NO: 638 


Polypeptide encoded by SEQ ID NO: 637 


30 aa 


SEQ ID NO: 639 


HepCla segment 117 


90nts 


SEQ ID NO: 640 


Polypeptide encoded by SEQ ID NO: 639 


30 aa 


SEQ ID NO: 641 


HepCla segment 118 


90nts 


SEQ ID NO: 642 


Polypeptide raicoded by SEQ ID NO: 641 


30 aa 


SEQ ID NO: 643 


HepCla segment 119 


90nts 


SEQ ID NO: 644 


Polypqjtide encoded by SEQ ID NO: 643 


30 aa 


SEQ ID NO: 645 


HepCla segment 120 


90nts 


SEQ ID NO: 646 


Polypeptide encoded by SEQ ID NO: 645 


30 aa 
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SEQUENCE ID 
■ NUMBER. 


SEQUENCE • • 


LENGTH" 


SEQ ID NO: 647 


HepCla segment 121 


90nts 


SEQ ID NO: 648 


Polypeptide encoded by SEQ ID NO: 647 


30 aa 


SEQ ID NO: 649 


HepCla segment 122 


90nts 


SEQ ID NO: 650 


Polypeptide encoded by SEQ ID NO: 649 


30 aa 


SEQ ID NO: 651 


HepCla segment 123 


90nts 


SEQ ID NO: 652 


Polypeptide encoded by SEQ ID NO: 651 


30 aa 


SEQ ID NO: 653 


HepCla segment 124 


90nts 


SEQ ID NO: 654 


Polypeptide encoded by SEQ ID NO: 653 


30 aa 


SEQ ID NO: 655 


HepCla segment 125 


90nts 


SEQ ID NO: 656 


Polypeptide encoded by SEQ ID NO: 655 


30 aa 


SEQ ID NO: 657 


HepCla segment 126 


90nts 


SEQ ID NO: 658 


Polypeptide encoded by SEQ ID NO: 657 


30 aa 


SEQ ID NO: 659 


HepCla segment 127 


90nts 


SEQ ID NO: 660 


Polypeptide encoded by SEQ ID NO: 659 


30 aa 


SEQ ID NO: 661 


HepCla segment 128 


90nts 


SEQ ID NO: 662 


Polypeptide encoded by SEQ ID NO: 661 


30 aa 


SEQ ID NO: 663 


HepCla segment 129 


90nts 


SEQ ID NO: 664 


Polypeptide encoded by SEQ ID NO: 663 


30 aa 


SEQ ID NO: 665 


HepCla segment 130 


90nts 


SEQ ID NO: 666 


Polypeptide encoded by SEQ ID NO: 665 


30 aa 


SEQ ID NO: 667 


HepCla segment 131 


90nts 


SEQ ID NO: 668 


Polypeptide encoded by SEQ ID NO: 667 


30 aa 


SEQ ID NO: 669 


HepCla segment 132 


90nts 


SEQ ID NO: 670 


Polypeptide encoded by SEQ ID NO: 669 


30 aa 
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■ . SEQUENCE,]!) 
NUMBER 


■ * ' , ■, SEQUENCE 


LENGTH , 


SEQ ID NO: 671 


HepCla segment 133 


90nts 


SEQ ID NO: 672 


Polypeptide encoded by SEQ ID NO: 671 


30 aa 


SEQ ID NO: 673 


HepCla segment 134 


90nts 


SEQ ID NO: 674 


Polypeptide encoded by SEQ ID NO: 673 


30 aa 


SEQ ID NO: 675 


HepCla segment 135 


90nts 


SEQ ID NO: 676 


Polypeptide encoded by SEQ ID NO: 675 


30 aa 


SEQ ID NO: 677 


HepCla segment 136 


90nts 


SEQ ID NO: 678 


Polypeptide encoded by SEQ ID NO: 677 


30 aa 


SEQ ID NO: 679 


HepCla segment 137 


90nts 


SEQ ID NO: 680 


Polypeptide encoded by SEQ ID NO: 679 


30 aa 


SEQ ID NO: 681 


HepCla segment 138 


90nts 


SEQ ID NO: 682 


Polypeptide encoded by SEQ ID NO: 681 


30 aa 


SEQ ID NO: 683 


HepCla segment 139 , 


90 nts 


SEQ ID NO: 684 


Polypeptide encoded by SEQ ID NO: 683 


30 aa 


SEQ ID NO: 685 


HepCla segment 140 


90 nts 


SEQ ID NO: 686 


Polypeptide encoded by SEQ ID NO: 685 


30 aa 


SEQ ID NO: 687 


HepCla segment 141 


90 nts 


SEQ ID NO: 688 


Polypeptide encoded by SEQ ID NO: 687 


30 aa 


SEQ ID NO: 689 


HepCla segment 142 


90 nts 


SEQ ID NO: 690 


Polypeptide encoded by SEQ ID NO: 689 


30 aa 


SEQ ID NO: 691 


HepCla segment 143 


90 nts 


SEQ ID NO: 692 


Polypeptide encoded by SEQ ID NO: 691 


30 aa 


SEQ ID NO: 693 


HepCla segment 144 


90 nts 


SEQ ID NO: 694 


Polypeptide encoded by SEQ ID NO: 693 


30 aa 
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SEQUENCEID 
: . NUMBER 


SEQUENCE • ■ : 


• LENGTH 

' ■ • . ' 


SEQ ID NO: 695 


HepCla segment 145 


90nts 


SEQ ID NO: 696 


Polypeptide encoded by SEQ ID NO: 695 


30 aa 


SEQ ID NO: 697 


HepCla segment 146 


90nts 


SEQ ID NO: 698 


Polypeptide encoded by SEQ ID NO: 697 


30 aa 


SEQ ID NO: 699 


HepCla segment 147 


90nts 


SEQ ID NO: 700 


Polypeptide encoded by SEQ ID NO: 699 


30 aa 


SEQ ID NO: 701 


HepCla segment 148 


90nts 


SEQ ID NO: 702 


Polypeptide encoded by SEQ ID NO: 701 


30 aa 


SEQ ID NO: 703 


HepCla segment 149 


90 nts 


SEQ ID NO: 704 


Polypeptide encoded by SEQ ID NO: 703 


30 aa 


SEQ ID NO: 705 


HepCla segment 150 


90 nts 


SEQ ID NO: 706 


Polypeptide encoded by SEQ ID NO: 705 


30 aa 


SEQ ID NO: 707 


HepCla segment 151 


90 nts 


SEQ ID NO: 708 


Polypeptide encoded by SEQ ID NO: 707 


30 aa 


SEQ ID NO: 709 


HepCla segment 152 


90 nts 


SEQ ID NO: 710 


Polypeptide encoded by SEQ ID NO: 709 


30 aa 


SEQ ID NO: 711 


HepCla segmatit 153 


90 nts 


SEQ ID NO: 712 


Polypeptide encoded by SEQ ID NO: 711 


30 aa 


SEQ ID NO: 713 


HepCla segment 154 


90 nts 


SEQ ID NO: 714 


Polypeptide encoded by SEQ ID NO: 713 


30 aa 


SEQ ID NO: 715 


HepCla segment 155 


90 nts 


SEQ ID NO: 716 


Polypeptide oacoded by SEQ ID NO: 715 


30 aa 


SEQ ID NO: 717 


HepCla segment 156 


90 nts 


SEQ ID NO: 718 


Polypeptide encoded by SEQ ID NO: 717 


30 aa 
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SEQUENCE ID 
, NUMBER 


' SEQUENCE • . - ' 


LENGTH 


SEQ ID NO: 719 


HepC la segment 157 


90nts 


SEQIDNO:720 


Polypeptide encoded by SEQ ID NO: 719 


30 aa 


SEQ ID NO: 721 


HepC la segment 158 


90nts 


SEQ ID NO: 722 


Polypeptide encoded by SEQ ID NO: 721 


30 aa 


SEQ ID NO: 723 


HepCla segment 159 


90nts 


SEQ ID NO: 724 


Polypeptide encoded by SEQ ID NO: 723 


30 aa 


SEQ ID NO: 725 


HepCla segment 160 


90nts 


SEQ ID NO: 726 


Polypeptide encoded by SEQ ID NO: 725 


30 aa 


SEQ ID NO: 727 


HepCla segment 161 


90nts 


SEQ ID NO: 728 


Polypeptide encoded by SEQ ID NO: 727 


30 aa 


SEQ ID NO: 729 


HepCla segment 162 


90nts 


SEQ ID NO: 730 


Polypeptide encoded by SEQ ID NO: 729 


30 aa 


SEQ ID NO: 731 


HepCla segment 163 


90nts 


SEQ ID NO: 732 


Polypeptide encoded by SEQ ID NO: 731 


30 aa 


SEQ ID NO: 733 


HepCla segment 164 


90nts 


SEQ ID NO: 734 


Polypeptide encoded by SEQ ID NO: 733 


30 aa 


SEQ ID NO: 735 


HepCla segment 165 


90nts 


SEQ ID NO: 736 


Polypeptide encoded by SEQ ID NO: 735 


30 aa 


SEQ ID NO: 737 


HepCla segment 166 


90nts 


SEQ ID NO: 738 


Polypeptide encoded by SEQ ID NO: 737 


30 aa 


SEQ ID NO: 739 


HepCla segment 167 


90nts 


SEQ ID NO: 740 


Polypeptide encoded by SEQ ID NO: 739 


30 aa 


SEQ ID NO: 741 


HepCla segment 168 


90nts 


SEQ ID NO: 742 


Polypeptide encoded by SEQ ID NO: 741 


30 aa 
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■ SEQUENGEID 
NUMBER 


; • SEQUENCE, • ', '■ : • 


.LENGTH.' 


SEQIDNO: 743 


HepCla segment 169 


90nts 


SEQ ID NO: 744 


Polypeptide encoded by SEQ ID NO: 743 


30 aa 


SEQ ID NO: 745 


HepCla segment 170 


90nts 


SEQ ID NO: 746 


Polypeptide encoded by SEQ ID NO: 745 


30 aa 


SEQ ID NO: 747 


HepC 1 a segment 171 


90nts 


SEQ ID NO: 748 


Polypeptide encoded by SEQ ID NO: 747 


30 aa 


SEQ ID NO: 749 


HepCla segment 172 


90nts 


SEQ ID NO: 750 


Polypeptide encoded by SEQ ID NO: 749 


30 aa 


SEQIDNO: 751 


HepCla segment 173 


90nts 


SEQ ID NO: 752 


Polypeptide encoded by SEQ ID NO: 751 


30 aa 


SEQ ID NO: 753 


HepCla segment 174 


90nts 


SEQ ID NO: 754 


Polypeptide encoded by SEQ ID NO: 753 


30 aa 


SEQ ID NO: 755 


HepCla segment 175 


90nts 


SEQ ID NO: 756 


Polypeptide encoded by SEQ ID NO: 755 


30 aa 


SEQ ID NO: 757 


HepCla segment 176 


90nts 


SEQ ID NO: 758 


Polypeptide encoded by SEQ ID NO: 757 


30 aa 


SEQ ID NO: 759 


HepCla segment 177 


90nts 


SEQ ID NO: 760 


Polypeptide encoded by SEQ ID NO: 759 


30 aa 


SEQIDNO: 761 


HepCla segment 178 


90nts 


SEQ ID NO: 762 


Polypeptide encoded by SEQ ID NO: 761 


30 aa 


SEQ ID NO: 763 


HepCla segment 179 


90nts 


SEQ ID NO: 764 


Polypeptide encoded by SEQ ID NO: 763 


30 aa 


SEQ ID NO: 765 


HepC 1 a segment 180 


90nts 


SEQ ID NO: 766 


Polypeptide encoded by SEQ ID NO: 765 


30 aa 
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• SEQUENCE ID 
NUMBER 


SEQUENCE . • ' , 


LENGTH 


SEQ ID NO: 767 


HepC la segment 181 


90 nts 


SEQIDNO: 768 


Polypeptide encoded by SEQ ID NO: 767 


30 aa 


SEQ ID NO: 769 


HepC la segment 182 


90 nts 


SEQ ID NO: 770 


Polypeptide encoded by SEQ ID NO: 769 


30 aa 


SEQ ID NO: 771 


HepC la segment 183 


90 nts 


SEQ ID NO: 772 


Polypeptide encoded by SEQ ID NO: 771 


30 aa 


SEQ ID NO: 773 


HepC la segment 184 


90 nts 


SEQ ID NO: 774 


Polypeptide encoded by SEQ ID NO: 773 


30 aa. 


SEQ ID NO: 775 


HepC la segment 185 


90 nts 


SEQ ID NO: 776 


Polypeptide encoded by SEQ ID NO: 775 


30 aa 


SEQ ID NO: 777 


HepC la segment 186 


90 nts 


SEQ ID NO: 778 


Polypeptide encoded by SEQ ID NO: 777 


30 aa 


SEQ ID NO: 779 


HepC la segment 187 


90 nts 


SEQ ID NO: 780 


Polypeptide encoded by SEQ ID NO: 779 


30 aa 


SEQIDNO: 781 


HepCla segment 188 


90 nts 


SEQ ID NO: 782 


Polypeptide encoded by SEQ ID NO: 781 


30 aa 


SEQ ID NO: 783 


HepCla segment 189 


90 nts 


SEQ ID NO: 784 


Polypeptide encoded by SEQ ID NO: 783 


30 aa 


SEQ ID NO: 785 


HepCla segment 190 


90 nts 


SEQ ID NO: 786 


Polypeptide encoded by SEQ ID NO: 785 


30 aa 


SEQ ID NO: 787 


HepCla segment 191 


90 nts 


SEQ ID NO: 788 


Polypeptide encoded by SEQ ID NO: 787 


30 aa 


SEQ ID NO: 789 


HepCla segment 192 


90 nts 


SEQ ID NO: 790 


Polypeptide encoded by SEQ ID NO: 789 


30 aa 
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SEQUENCE ID 
NUMBER 


SEQUENCE 


LENGTH 


SEQ ID NO: 791 


HepCla segment 193 


90nts 


SEQ ID NO: 792 


Polypeptide encoded by SEQ ID NO: 791 


30 aa 


SEQ ID NO: 793 


HepCla segment 194 


90nts 


SEQ ID NO: 794 


Polypeptide encoded by SEQ ID NO: 793 


30 aa 


SEQ ID NO: 795 


HepCla segment 195 


90nts 


SEQ ID NO: 796 


Polypeptide encoded by SEQ ID NO: 795 


30 aa 


SEQ ID NO: 797 


HepCla segment 196 


90nts 


SEQ ID NO: 798 


Polypeptide encoded by SEQ ID NO: 797 


30 aa 


SEQ ID NO: 799 


HepCla segment 197 


90nts 


SEQ ID NO: 800 


Polypeptide encoded by SEQ ID NO: 799 


30 aa 


SEQ ID NO: 801 


HepCla segment 198 


90nts 


SEQ ID NO: 802 


Polypeptide encoded by SEQ ID NO: 801 


30 aa 


SEQ ID NO: 803 


HepCla segment 199 


90nts 


SEQ ID NO: 804 


Polypeptide encoded by SEQ ID NO: 803 


30 aa 


SEQ ID NO: 805 


HepCla segment 200 


90nts 


SEQ ID NO: 806 


Polypeptide encoded by SEQ ID NO: 805 


30 aa 


SEQ ID NO: 807 


HepC 1 a segment 20 1 


45nts 


SEQ ID NO: 808 


Polypeptide encoded by SEQ ID NO: 807 


15 aa 


SEQ DD NO: 809 


HepCla scrambled 


17955 nts 


SEQ ID NO: 810 


Polypeptide encoded by SEQ ID NO: 809 


5985 aa 


SEQ ID NO: 811 


HepC Cassette A 


6065 nts 


SEQ ID NO: 812 


Polypeptide encoded by SEQ ID NO: 811 


2011 aa 


SEQ ID NO: 813 


HepC Cassette B 


6069 nts 


SEQ ID NO: 814 


Polypeptide encoded by SEQ ID NO: 813 


2010 aa 
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SEQUENCE ID 
NUMBER 


SEQUENCE "• ' • ' 


■LENGTH 


SEQIDNO: 815 


HepC Cassette C 


6030 nts 


SEQIDNO: 816 


Polypeptide encoded by SEQ ID NO: 815 


1997 aa 


SEQIDNO: 817 


gplOO consensus polypeptide 


661 aa 


SEQIDNO: 818 


MART consensus polypeptide 


118 aa 


SEQIDNO: 819 


TRP-1 consensus polypeptide 


248 aa 


SEQ ID NO: 820 


Tyros consensus polypeptide 


529 aa 


SEQ ID NO: 821 


TRP2 consensus polypeptide 


519 aa 


SEQ ID NO: 822 


MCIR consensus polypeptide 


317 aa 


SEQ ID NO: 823 


MUCIF consensus polypeptide 


125 aa 


SEQ ID NO: 824 


MUCIR consensus polypeptide 


312 aa 


SEQ ID NO: 825 


BAGE consensus polypeptide 


43 aa 


SEQ ID NO: 826 


GAGE-1 consensus polypeptide 


138 aa 


SEQ ID NO: 827 


gpl00lQ4 consensus polypeptide 


51 aa 


SEQ ID NO: 828 


MAGE-1 consensus polypeptide 


309 aa 


SEQ ID NO: 829 


MAGE-3 consensus polypeptide 


314 aa 


SEQ ID NO: 830 


FRAME consensus polypeptide 


509 aa 


SEQIDNO: 831 


TRP2IN2 consensus polypeptide 


54 aa 


SEQ ID NO: 832 


NYNSOla consensus polypeptide 


180 aa 


SEQ ID NO: 833 


NYNSOlb consensus polypeptide 


58 aa 


SEQ ID NO: 834 


LAGEl consensus polypeptide 


180 aa 


SEQ ID NO: 835 


gplOO segment 1 


90 nts 


SEQ ID NO: 836 


Polypeptide encoded by SEQ ID NO: 835 


30 aa 


SEQ ID NO: 837 


gplOO segment 2 


90 nts 


SEQ ID NO: 838 


Polypeptide encoded by SEQ ID NO: 837 


30 aa 
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SEQUENCEJDD 


, ■ ■. ' .SEQUENCE, '', '■■ 


^LENGTH 


■] ■[. NUMBER-: 






SEQ ID NO: 839 


gplOO segment 3 


90nts 


SEQ ID NO: 840 


Polypeptide encoded by SEQ ID NO: 839 


30 aa 


SEQ ID NO: 841 


gplOO segment 4 


90nts 


SEQ ID NO: 842 


Polypeptide encoded by SEQ ID NO: 841 


30 aa 


SEQ ID NO: 843 


gplOO segment 5 


90nts 


SEQ ID NO: 844 


Polypeptide encoded by SEQ ID NO: 843 


30 aa 


SEQ ID NO: 845 


gplOO segment 6 


90nts 


SEQ ID NO: 846 


Polypeptide encoded by SEQ ID NO: 845 


30 aa 


SEQ ID NO: 847 


gplOO segment 7 


90nts 


SEQ ID NO: 848 


Polypeptide encoded by SEQ ID NO: 847 


30 aa 


SEQ ID NO: 849 


gplOO segment 8 


90nts 


SEQ ID NO: 850 


Polypeptide encoded by SEQ ID NO: 849 


30 aa 


SEQ ID NO: 851 


gplOO segment 9 


90nts 


SEQ ID NO: 852 


Polypeptide encoded by SEQ ID NO: 851 


30 aa 


SEQ ID NO: 853 


gplOO segment 10 


90nts 


SEQ ID NO: 854 


Polypeptide encoded by SEQ ID NO: 853 


30 aa 


SEQ ID NO: 855 


gplOO segment 11 


90nts 


SEQ ID NO: 856 


Polypeptide encoded by SEQ ID NO: 855 


30 aa 


SEQ ID NO: 857 


gplOO segment 12 


90nts 


SEQ ID NO: 858 


Polypeptide encoded by SEQ ID NO: 857 


30 aa 


SEQ ID NO: 859 


gplOO segment 13 


90nts 


SEQ ID NO: 860 


Polypeptide encoded by SEQ ID NO: 859 


30 aa 


SEQ ID NO: 861 


gplOO segment 14 


90nts 


SEQ ID NO: 862 


Polypeptide encoded by SEQ ID NO: 861 


30 aa 
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. SEQUENCE ID 
' NUMBER , 


. SEQUENCE 

- - ■ • .1 , ..^^,^.1^ . : • - • ' ■ ' 


' LENGTH 


SEQ ID NO: 863 


gplOO segment 15 


90 nts 


SEQ ID NO: 864 


Polypeptide encoded by SEQ ID NO: 863 


30 aa 


SEQ ID NO: 865 


gplOO segment 16 


90 nts 


SEQ ID NO: 866 


Polypeptide encoded by SEQ ID NO: 865 


30 aa 


SEQ ID NO: 867 


gplOO segment 17 


90 nts 


SEQ ID NO: 868 


Polypeptide encoded by SEQ ID NO: 867 


30 aa 


SEQ ID NO: 869 


gplOO segment 18 


90 nts 


SEQ ID NO: 870 


Polypeptide encoded by SEQ ID NO: 869 


30 aa 


SEQ ID NO: 871 


gplOO segment 19 


90 nts 


SEQ ID NO: 872 


Polypeptide encoded by SEQ ID NO: 871 


30 aa 


SEQ ID NO: 873 


^100 segment 20 


90 nts 


SEQ ID NO: 874 


Polypeptide encoded by SEQ ID NO: 873 


30 aa 


SEQ ID NO: 875 


gplOO segment 21 


90 nts 


SEQ ID NO: 876 


Polypeptide encoded by SEQ ID NO: 875 


30 aa 


SEQ ID NO: 877 


gplOO segment 22 


90 nts 


SEQ ID NO: 878 


Polypeptide encoded by SEQ ID NO: 877 


30 aa 


SEQ ID NO: 879 


gplOO segment 23 


90 nts 


SEQ ID NO: 880 


Polypeptide encoded by SEQ ID NO: 879 


30 aa 


SEQ ID NO: 881 


gplOO segment 24 


90 nts 


SEQ ID NO: 882 


Polypeptide encoded by SEQ ID NO: 881 


30 aa 


SEQ ID NO: 883 


gplOO segment 25 


90 nts 


SEQ ID NO: 884 


Polypeptide encoded by SEQ ID NO: 883 


30 aa 


SEQ ID NO: 885 


gplOO segment 26 


90 nts 


SEQ ID NO: 886 


Polypeptide encoded by SEQ ID NO: 885 


30 aa 
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; : SEQUENCE ID 
'NUMBER 


. ' SEQUENCE ■• ■ • 


- • ■- LENGTH ■ 


SEQ ID NO: 887 


gplOO segment 27 


90nts 


SEQ ID NO: 888 


Polypeptide encoded by SEQ ID NO: 887 


30 aa 


SEQ ID NO: 889 


gplOO segment 28 


90nts 


SEQ ID NO: 890 


Polypeptide encoded by SEQ ID NO: 889 


30 aa 


SEQ ID NO: 891 


gplOO segment 29 


90nts 


SEQ ID NO: 892 


Polypeptide encoded by SEQ ID NO; 891 


30 aa 


SEQ ID NO: 893 


gplOO segment 30 


90nts 


SEQ ID NO: 894 


Polypeptide encoded by SEQ ID NO: 893 


30 aa 


SEQ ID NO: 895 


gplOO segment 31 


90nts 


SEQ ID NO: 896 


Polypeptide encoded by SEQ ID NO: 895 


30 aa 


SEQ ID NO: 897 


gplOO segment 32 


90nts 


SEQ ID NO: 898 


Polypeptide encoded by SEQ ID NO: 897 


30 aa 


SEQ ID NO: 899 


gplOO segment 33 


90nts 


SEQ ID NO: 900 


Polypeptide encoded by SEQ ID NO: 899 


30 aa 


SEQ ID NO: 901 


gplOO segment 34 


90nts 


SEQ ID NO: 902 


Polypeptide encoded by SEQ ID NO: 901 


30 aa 


SEQ ID NO: 903 


gplOO segment 35 


90nts 


SEQ ID NO: 904 


Polypeptide encoded by SEQ ID NO: 903 


30 aa 


SEQ ID NO: 905 


gplOO segment 36 


90nts 


SEQ ID NO: 906 


Polypeptide encoded by SEQ ID NO: 905 


30 aa 


SEQ ID NO: 907 


gplOO segment 37 


90nts 


SEQ ID NO: 908 


Polypeptide encoded by SEQ ID NO: 907 


30 aa 


SEQ ID NO: 909 


gplOO segment 38 


90nts 


SEQ ID NO: 910 


Polypeptide encoded by SEQ ID NO: 909 


30 aa 
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. : SEQUENCE ID 
'* ■ NUMBER 


SEQUENCE ■ ' ' ■;• 

' ' ' 


, . LENGTH 


SEQIDNO:911 


gplOO segment 39 


90nts 


SEQ ID NO: 912 


Polypeptide encoded by SEQ ID NO: 91 1 


30 aa 


SEQIDNO:913 


gplOO segment 40 


90nts 


SEQ ID NO: 914 


Polypeptide encoded by SEQ ID NO: 913 


30 aa 


SEQ ID NO: 915 


gplOO segment 41 


90nts 


SEQ ID NO: 916 


Polypeptide encoded by SEQ ID NO: 915 


30 aa 


SEQ ID NO: 917 


gp 100 segment 42 


90nts 


SEQ ID NO: 918 


Polypeptide encoded by SEQ ID NO: 917 


30 aa 


SEQ ID NO: 919 


gplOO segment 43 


90nts 


SEQ ID NO: 920 


Polypeptide encoded by SEQ ID NO: 919 


30 aa 


SEQ ID NO: 921 


gplOO segment 44 


60nts 


SEQ ID NO: 922 


Polypeptide encoded by SEQ ID NO: 921 


20 aa 


SEQ ID NO: 923 


MART segment 1 


90nts 


SEQ ID NO: 924 


Polypeptide encoded by SEQ ID NO: 923 


30 aa 


SEQ ID NO: 925 


MART segment 2 


90nts 


SEQ ID NO: 926 


Polypeptide encoded by SEQ ID NO: 925 


30 aa 


SEQ ID NO: 927 


MART segment 3 


90nts 


SEQ ID NO: 928 


Polypeptide encoded by SEQ ID NO: 927 


30 aa 


SEQ ID NO: 929 


MART segment 4 


90nts 


SEQ ID NO: 930 


Polypeptide encoded by SEQ ID NO: 929 


30 aa 


SEQ ID NO: 931 


MART segment 5 


90nts 


SEQ ID NO: 932 


Polypeptide encoded by SEQ ID NO: 931 


30 aa 


SEQ ID NO: 933 


MART segment 6 


90nts 


S,EQ ID NO: 934 


Polypeptide encoded by SEQ ID NO: 933 


30 aa 
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, SEQUENCE ID 
NUMBER ■ 


"sequence- ' "" ' 


■ LENOm , 


SEQ ID NO: 935 


MART segment 7 


90nts 


SEQ ID NO: 936 


Polypeptide encoded by SEQ ID NO: 935 


30 aa 


SEQ ID NO: 937 


MART segment 8 


51 nts 


SEQ ID NO: 938 


Polypeptide encoded by SEQ ID NO: 937 


17 aa 


SEQ ID NO: 939 


trp-1 segment 1 


90 nts 


SEQ ID NO: 940 


Polypeptide encoded by SEQ ID NO: 939 


30 aa 


SEQ ID NO: 941 


trp-1 segment 2 


90 nts 


SEQ ID NO: 942 


Polypeptide encoded by SEQ ID NO: 941 


30 aa 


SEQ ID NO: 943 


trp-1 segment 3 


90 nts 


SEQ ID NO: 944 


Polypeptide encoded by SEQ ID NO: 943 


30 aa 


SEQ ID NO: 945 


trp-1 segment 4 


90 nts 


SEQ ID NO: 946 


Polypeptide encoded by SEQ ID NO: 945 


30 aa 


SEQ ID NO: 947 


trp-1 segment 5 


90 nts 


SEQ ID NO: 948 


Polypeptide encoded by SEQ ID NO: 947 


30 aa 


SEQ ID NO: 949 


trp-1 segment 6 


90 nts 


SEQ ID NO: 950 


Polypeptide encoded by SEQ ID NO: 949 


30 aa 


SEQ ID NO: 951 


trp-1 segment 7 


90 nts 


SEQ ID NO: 952 


Polypeptide encoded by SEQ ID NO: 951 


30 aa 


SEQ ID NO: 953 


trp-1 segment 8 


90 nts 


SEQ ID NO: 954 


Polypeptide encoded by SEQ ID NO: 953 


30 aa 


SEQ ID NO: 955 


trp-1 segment 9 


90 nts 


SEQ ID NO: 956 


Polypeptide encoded by SEQ ID NO: 955 


30 aa 


SEQ ID NO: 957 


trp-1 segment 10 


90 nts 


SEQ ID NO: 958 


Polypeptide encoded by SEQ ID NO: 957 


30 aa 
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SEQUENCEID 


SEQUENCE 


LENGTH , 


• ^NUMBER 






SEQIDNO: 959 


trp-1 segment 1 1 


90nts 


SEQ ID NO: 960 


Polypeptide encoded by SEQ ID NO: 959 


30 aa 


SEQ ID NO: 961 


trp-1 segment 12 


90nts 


SEQ ID NO: 962 


Polypeptide encoded by SEQ ID NO: 961 


30 aa 


SEQ ID NO: 963 


trp-1 segment 13 


90nts 


SEQ ID NO: 964 


Polypeptide encoded by SEQ ID NO: 963 


30 aa 


SEQ ID NO: 965 


trp-1 segment 14 


90nts 


SEQ ID NO: 966 


Polypeptide encoded by SEQ ID NO: 965 


30 aa 


SEQ ID NO: 967 


trp-1 segment 15 


90 nts 


SEQ ID NO: 968 


Polypeptide encoded by SEQ ID NO: 967 


30 aa 


SEQ ID NO: 969 


trp-1 segment 16 


81 nts 


SEQ ID NO: 970 


Polypeptide encoded by SEQ ID NO: 969 


27 aa 


SEQ ID NO: 971 


tyros segment 1 


90 nts 


SEQ ID NO: 972 


Polypeptide encoded by SEQ ID NO: 971 


30 aa 


SEQ ID NO: 973 


tyros segment 2 


90 nts 


SEQIDNO: 974 


Polypeptide eaacoded by SEQ ID NO: 973 


30 aa 


SEQ ID NO: 975 


tyros segment 3 


90 nts 


SEQ ID NO: 976 


Polypeptide encoded by SEQ ID NO: 975 


30 aa 


SEQ ID NO: 977 


tyros segment 4 


90 nts 


SEQ ID NO: 978 


Polypeptide encoded by SEQ ID NO: 977 


30 aa 


SEQ ID NO: 979 


tyros segment 5 


90 nts 


SEQIDNO: 980 


Polypeptide encoded by SEQ ID NO: 979 


30 aa 


SEQIDNO: 981 


tjTos segment 6 


90 nts 


SEQ ID NO: 982 


Polypeptide encoded by SEQ ID NO: 981 


30 aa 
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SEQUENCE ID 
NUMBER 


SEQUENCE . ■ \ . ■ •• 


LENGTH 


SEQ ID NO: 983 


tyros segment 7 


90nts 


SEQ ID NO: 984 


Polypeptide encoded by SEQ ID NO: 983 


30 aa 


SEQ ID NO: 985 


tyros segment 8 


90nts 


SEQ ID NO: 986 


Polypeptide encoded by SEQ ID NO: 985 


30 aa 


SEQ ID NO: 987 


tyros segment 9 


90nts 


SEQ ID NO: 988 


Polypeptide encoded by SEQ ID NO: 987 


30 aa 


SEQ ID NO: 989 


tyros segment 10 


90nts 


SEQ ID NO: 990 . 


Polypeptide encoded by SEQ ID NO: 989 


30 aa 


SEQ ID NO: 991 


tyros segment 1 1 


90nts 


SEQ ID NO: 992 


Polypeptide encoded by SEQ ID NO: 991 


30 aa 


SEQ ID NO: 993 


tyros segment 12 


90nts 


SEQ ID NO: 994 


Polypeptide encoded by SEQ ID NO: 993 


30 aa 


SEQ ID NO: 995 


tyros segment 13 


90nts 


SEQ ID NO: 996 


Polypeptide encoded by SEQ ID NO: 995 


30 aa 


SEQ ID NO: 997 


tyros segment 14 


90nts 


SEQ ID NO: 998 


Polypeptide encoded by SEQ ID NO: 997 


30 aa 


SEQ ID NO: 999 


tyros segment 15 


90nts 


SEQ ID NO: 1000 


Polypeptide encoded by SEQ ID NO: 999 


30 aa 


SEQ ID NO: 1001 


tyros segment 16 


90nts 


SEQ ID NO: 1002 


Polypeptide encoded by SEQ ID NO: 1001 


30 aa 


SEQ ID NO: 1003 


tyros segment 17 


90nts 


SEQ ID NO: 1004 


Polypeptide encoded by SEQ ID NO: 1003 


30 aa 


SEQ ID NO: 1005 


tyros segment 1 8 


90nts 


SEQ ID NO: 1006 


Polypeptide encoded by SEQ ID NO: 1005 


30 aa 
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SEQUENCER) 
NUMBER 


SEQUENCE 


LENGTH 


SEQIDNO: 1007 


tyros segment 19 


90nts 


SEQ ID NO: 1008 


Polypeptide encoded by SEQ ID NO: 1007 


30 aa 


SEQ ID NO: 1009 


tjTos segment 20 


90nts 


SEQIDNO: 1010 


Polypeptide encoded by SEQ ID NO: 1009 


30 aa 


SEQIDNO: 1011 


tyros segment 21 


90nts 


SEQIDNO: 1012 


Polypeptide encoded by SEQ ID NO: 1011 


30 aa 


SEQIDNO: 1013 


tyros segment 22 


90nts 


SEQIDNO: 1014 


Polypeptide encoded by SEQ ID NO: 1013 


30 aa 


SEQIDNO: 1015 


tyros segment 23 


90nts 


SEQIDNO: 1016 


Polypeptide encoded by SEQ ID NO: 1015 


30 aa 


SEQIDNO: 1017 


tyros segment 24 


90nts 


SEQIDNO: 1018 


Polypeptide encoded by SEQ ID NO: 1017 


30 aa 


SEQIDNO: 1019 


tyros segment 25 


90nts 


SEQIDNO: 1020 


Polypeptide encoded by SEQ ID NO: 1019 


30 aa 


SEQIDNO: 1021 


tyros segment 26 


90nts 


SEQIDNO: 1022 


Polypeptide encoded by SEQ ID NO: 1021 


30 aa 


SEQIDNO: 1023 


tyros segment 27 


90 nts 


SEQIDNO: 1024 


Polypeptide encoded by SEQ ID NO: 1023 


30 aa 


SEQIDNO: 1025 


tyros segment 28 


90 nts 


SEQ ID NO: 1026 


Polypeptide encoded by SEQ ID NO: 1025 


30 aa 


SEQ ID NO: 1027 


tyros segment 29 


90 nts 


SEQ ID NO: 1028 


Polypeptide encoded by SEQ ID NO: 1027 


30 aa 


SEQIDNO: 1029 


tyros segment 30 


90 nts 


SEQIDNO: 1030 


Polypeptide encoded by SEQ ID NO: 1029 


30 aa 
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SEQUENCE ID 
NUMBER 


SEQUENCE . 


LENGTH 


SEQIDNO: 1031 


tyros segment 31 


90nts 


SEQIDNO: 1032 


Polypeptide encoded by SEQ ID NO: 1031 


30 aa 


SEQIDNO: 1033 


tyros segment 32 


90nts 


SEQIDNO: 1034 


Polypeptide encoded by SEQ ID NO: 1033 


30 aa 


SEQIDNO: 1035 


tyros segment 33 


90nts 


SEQ ID NO: 1036 


Polypeptide encoded by SEQ ID NO: 1035 


30 aa 


SEQ ID NO: 1037 


tyros segment 34 


90nts 


SEQ ID NO: 1038 


Polypeptide encoded by SEQ ID NO: 1037 


30 aa 


SEQ ID NO: 1039 


tyros segment 35 


69nts 


SEQ ID NO: 1040 


Polypeptide encoded by SEQ ID NO: 1039 


23 aa 


SEQ ID NO: 1041 


trp2 segment 1 


90nts 


SEQ ID NO: 1042 


Polypeptide encoded by SEQ ID NO: 1041 


30 aa 


SEQ ID NO: 1043. 


trp2 segment 2 


90nts 


SEQ ID NO: 1044 


Polypeptide encoded by SEQ ID NO: 1043 


30 aa 


SEQIDNO: 1045 


trp2 segment 3 


90nts 


SEQ ID NO: 1046 


Polypeptide encoded by SEQ ID NO: 1045 


30 aa 


SEQIDNO: 1047 


trp2 segment 4 


90nts 


SEQ ID NO: 1048 


Polypeptide encoded by SEQ ID NO: 1047 


30 aa 


SEQIDNO: 1049 


trp2 segment 5 


90 nts 


SEQIDNO: 1050 


Polypeptide encoded by SEQ ID NO: 1049 


30 aa 


SEQ ID NO: 1051 


trp2 segment 6 


90 nts 


SEQIDNO: 1052 


Polypeptide encoded by SEQ ID NO: 1051 


30 aa 


SEQ ID NO: 1053 


trp2 segment 7 


90 nts 


-SEQIDNO: 1054 


Polypeptide encoded by SEQ ID NO: 1053 


30 aa 
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. SEQUENCE ID 
NUMBER 


SEQUENCE ' " , •. 


LENGTH ■ 


SEQ ID NO: 1055 


trp2 segment 8 


90nts 


SEQ ID NO: 1056 


Polypeptide encoded by SEQ ID NO: 1055 


30 aa 


SEQ ID NO: 1057 


trp2 segment 9 


90nts 


SEQ ID NO: 1058 


Polypeptide encoded by SEQ ID NO: 1057 


30 aa 


SEQ ID NO: 1059 


trp2 segment 10 


90nts 


SEQ ID NO: 1060 


Polypeptide encoded by SEQ ID NO: 1059 


30 aa 


SEQ ID NO: 1061 


trp2 segment 1 1 


90nts 


SEQ ID NO: 1062 


Polypeptide encoded by SEQ ID NO: 1061 


30 aa 


SEQ ID NO: 1063 


trp2 segment 12 


90nts 


SEQ ID NO: 1064 


Polypeptide encoded by SEQ ID NO: 1063 


30 aa 


SEQ ID NO: 1065 


trp2 segment 13 


90nts 


SEQ ID NO: 1066 


Polypeptide encoded by SEQ ID NO: 1065 


30 aa 


SEQ ID NO: 1067 


trp2 segment 14 


90nts 


SEQ ID NO: 1068 


Polypeptide encoded by SEQ ID NO: 1067 


30 aa 


SEQ ID NO: 1069 


trp2 segment 15 


90nts 


SEQ ID NO: 1070 


Polypeptide encoded by SEQ ID NO: 1069 


30 aa 


SEQ ID NO: 1071 


trp2 segment 16 


90nts 


SEQ ID NO: 1072 


Polypeptide encoded by SEQ ID NO: 1071 


30 aa 


SEQ ID NO: 1073 


trp2 segment 17 


90nts 


SEQ ID NO: 1074 


Polypeptide encoded by SEQ ID NO: 1073 


30 aa 


SEQ ID NO: 1075 


trp2 segment 18 


90nts 


SEQ ID NO: 1076 


Polypeptide encoded by SEQ ID NO: 1075 


30 aa 


SEQ ID NO: 1077 


trp2 segment 19 


90nts 


SEQ ID NO: 1078 


Polypeptide encoded by SEQ ID NO: 1077 


30 aa 
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, SEQUENCE ID 
. ■ NUMBER ' , 


SEQUENCE ' • 


. LENGTH 


SEQ ID NO: 1079 


trp2 segment 20 


90iits 


SEQ ID NO: 1080 


Polypeptide encoded by SEQ ID NO: 1079 


30 aa ^ 


SEQ ID NO: 1081 


trp2 segment 21 


90nts 


SEQ ID NO: 1082 


Polypeptide encoded by SEQ ID NO: 1081 


30 aa 


SEQ ID NO: 1083 


trp2 segment 22 


90 nts 


SEQ ID NO: 1084 


Polypeptide encoded by SEQ ID NO: 1083 


30 aa 


SEQ ID NO: 1085 


tip2 segment 23 


90 nts 


SEQ ID NO: 1086 


Polypeptide encoded by SEQ ID NO: 1085 


30 aa 


SEQ ID NO: 1087 


trp2 segment 24 


90 nts 


SEQ ID NO: 1088 


Polypeptide encoded by SEQ ID NO: 1087 


30 aa 


SEQ ID NO: 1089 


trp2 segment 25 


90 nts 


SEQ ID NO: 1090 


Polypeptide encoded by SEQ ID NO; 1089 


30 aa 


SEQ ID NO: 1091 


trp2 segment 26 


90 nts 


SEQ ID NO: 1092 


Polypeptide encoded by SEQ ID NO: 1091 


30 aa 


SEQ ID NO: 1093 


trp2 segment 27 


90 nts 


SEQ ID NO: 1094 


Polypeptide encoded by SEQ ID NO: 1093 


30 aa 


SEQ ID NO: 1095 


trp2 segment 28 


90 nts 


SEQ ID NO: 1096 


Polypeptide encoded by SEQ ID NO: 1095 


30 aa 


SEQ ID NO: 1097 


trp2 segment 29 


90 nts 


SEQ ID NO: 1098 


Polypeptide encoded by SEQ ID NO: 1097 


30 aa 


SEQ ID NO: 1099 


trp2 segment 30 


90 nts 


SEQ ID NO: 1100 


Polypeptide encoded by SEQ ID NO: 1099 


30 aa 


SEQ ID NO: 1101 


trp2 segment 31 


90 nts 


SEQ ID NO: 1102 


Polypeptide encoded by SEQ ID NO: 1101 


30 aa 
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SEQUENCEID 
NUMBER . 


SEQUENCE . 

'■ ' ' : ■ „ 


LENGTH 


SEQIDNO: 1103 


trp2 segment 32 


90nts 


SEQIDNO: 1104 


Polypeptide encoded by SEQ ID NO: 1 103 


30 aa 


SEQIDNO: 1105 


trp2 segment 33 


90nts 


SEQIDNO: 1106 


Polypeptide encoded by SEQ ID NO: 1 105 


30 aa 


SEQIDNO: 1107 


trp2 segment 34 


84nts 


SEQIDNO: 1108 


Polypeptide encoded by SEQ ID NO: 1 107 


28 aa 


SEQIDNO: 1109 


MCIR segment 1 


90nts 


SEQIDNO: 1110 


Polypeptide encoded by SEQ ID NO: 1 109 


30 aa 


SEQIDNO: 1111 


MCIR segment 2 


90nts 


SEQIDNO: 1112 


Polypeptide encoded by SEQ ID NO: 1111 


30 aa 


SEQIDNO: 1113 


MCIR segment 3 


90nts 


SEQIDNO: 1114 


Polypeptide encoded by SEQ ID NO: 1 1 13 


30 aa 


SEQIDNO: 1115 


MCIR segment 4 


90nts 


SEQIDNO: 1116 


Polypeptide encoded by SEQ ID NO: 1115 


30 aa 


SEQIDNO: 1117 


MCIR segment 5 


90nts 


SEQIDNO: 1118 


Polypeptide encoded by SEQ ID NO: 1117 


30 aa 


SEQIDNO: 1119 


MCIR segment 6 


90nts 


SEQIDNO: 1120 


Polypeptide encoded by SEQ ID NO: 1119 


30 aa 


SEQIDNO: 1121 


MCIR segment 7 


90nts 


SEQIDNO: 1122 


Polypeptide encoded by SEQ ID NO: 1121 


30 aa 


SEQIDNO: 1123 


MCIR segment 8 


90nts 


SEQIDNO: 1124 


Polypeptide encoded by SEQ ID NO: 1 123 


30 aa 


SEQIDNO: 1125 


MCIR segment 9 


90 nts 


SEQIDNO: 1126 


Polypeptide encoded by SEQ ID NO: 1 125 


30 aa 
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SEQUENCEJD 
NUMBER . 


SEQUENCE ■ . ■ • 


LENGTH 


SEQIDNO: 1127 


MCIR segment 10 


90nts 


SEQIDNO: 1128 


Polypeptide encoded by SEQ ID NO: 1 127 


30 aa 


SEQIDNO: 1129 


MCIR segment 11 


90nts 


SEQIDNO: 1130 


Polypeptide encoded by SEQ ID NO: 1 129 


30 aa 


SEQIDNO: 1131 


MCIR segment 12 


90nts 


SEQIDNO: 1132 


Polypeptide encoded by SEQ ID NO: 1131 


30 aa 


SEQIDNO: 1133 


MCIR segment 13 


90nts 


SEQIDNO: 1134 


Polypeptide encoded by SEQ ID NO: 1 133 


30 aa 


SEQIDNO: 1135 


MCIR segment 14 


90nts 


SEQIDNO: 1136 


Polypeptide encoded by SEQ ID NO: 1135 


30 aa 


SEQIDNO: 1137 


MCIR segment 15 


90nts 


SEQIDNO: 1138 


Polypeptide encoded by SEQ ID NO: 1 137 


30 aa 


SEQIDNO: 1139 


MCIR segment 16 


90nts 


SEQIDNO: 1140 


Polypeptide encoded by SEQ ID NO: 1139 


30 aa 


SEQIDNO: 1141 


MCIR segment 17 


90nts 


SEQIDNO: 1142 


Polypeptide encoded by SEQ ID NO: 1 141 


30 aa 


SEQIDNO: 1143 


MCIR segment 18 


90nts 


SEQIDNO: 1144 


Polypeptide encoded by SEQ ID NO: 1 143 


30 aa 


SEQIDNO: 1145 


MCIR segment 19 


90nts 


SEQIDNO: 1146 


Polypeptide encoded by SEQ ID NO: 1 145 


30 aa 


SEQIDNO: 1147 


MCIR segment 20 


90nts 


SEQIDNO: 1148 


Polypeptide encoded by SEQ ID NO: 1 147 


30 aa 


SEQIDNO: 1149 


MCIR segment 21 


63 nts 


SEQIDNO: 1150 


Polypeptide encoded by SEQ ID NO: 1 149 


21 aa 
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. ; SEQUENCE ID 
. mJMBER 


•■' ■ •.,'^..V, ,. SEQtlENpE. V 


lenOth 


SEQIDNO: 1151 


MUCIF segment 1 


90nts 


SEQIDNO: 1152 


Polypeptide encoded by SEQ ID NO: 1151 


30 aa 


SEQIDNO: 1153 


MUCIF segment 2 


90nts 


SEQIDNO: 1154 


Polypeptide encoded by SEQ ID NO: 1 153 


30 aa 


SEQIDNO: 1155 


MUCIF segment 3 


90nts 


SEQIDNO: 1156 


Polypeptide encoded by SEQ ID NO: 1 155 


30 aa 


SEQIDNO: 1157 


MUCIF segment 4 


90nts 


SEQIDNO: 1158 


Polypeptide encoded by SEQ ID NO: 1 157 


30 aa 


SEQIDNO: 1159 


MUCIF segment 5 


90nts 


SEQIDNO: 1160 


Polypeptide encoded by SEQ ID NO: 1 159 


30 aa 


SEQIDNO: 1161 


MUCIF segment 6 


90nts 


SEQIDNO: 1162 


Polypeptide encoded by SEQ ID NO: 1161 


30 aa 


SEQIDNO: 1163 


MUCIF segment 7 


90nts 


SEQIDNO: 1164 


Polypeptide encoded by SEQ ID NO: 1 163 


30 aa 


SEQIDNO: 1165 


MUCIF segment 8 


72nts 


SEQIDNO: 1166 


Polypeptide encoded by SEQ ID NO: 1 165 


24 aa 


SEQIDNO: 1167 


MUCIR segment 1 


90nts 


SEQIDNO: 1168 


Polypeptide encoded by SEQ ID NO: 1 167 


30 aa 


SEQIDNO: 1169 


MUCIR segment 2 


90nts 


SEQIDNO: 1170 


Polypeptide encoded by SEQ ID NO: 1169 


30 aa 


SEQIDNO: 1171 


MUCIR segment 3 


90nts 


SEQIDNO: 1172 


Polypeptide encoded by SEQ ID NO: 1171 


30 aa 


SEQIDNO: 1173 


MUCIR segment 4 


90nts 


SEQIDNO: 1174 


Polypeptide encoded by SEQ ID NO: 1 173 


30 aa 
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' SEQUENCE ID 
NUMBER 


SEQUENCE ' . • . 


.LENGTH 


SEQIDNO: 1175 


MUCIR segment 5 


90nts 


SEQIDNO: 1176 


Polypeptide encoded by SEQ ID NO: 1 175 


30 aa 


SEQIDNO: 1177 


MUCIR segment 6 


90nts 


SEQIDNO: 1178 


Polypeptide encoded by SEQ ID NO: 1 177 


30 aa 


SEQIDNO: 1179 


MUCIR segment 7 


90nts 


SEQIDNO: 1180 


Polypeptide encoded by SEQ ID NO: 1 179 


30 aa 


SEQIDNO: 1181 


MUCIR segment 8 


90nts 


SEQIDNO: 1182 


Polypeptide encoded by SEQ ID NO: 1181 


30 aa 


SEQIDNO: 1183 


MUCIR segment 9 


90nts 


SEQIDNO: 1184 


Polypeptide encoded by SEQ ID NO: 1 183 


30 aa 


SEQIDNO: 1185 


MUCIR segment 10 


90nts 


SEQIDNO: 1186 


Polypeptide encoded by SEQ ID NO: 1 1 85 


30 aa 


SEQIDNO: 1187 


MUCIR segment 11 


90nts 


SEQIDNO: 1188 


Polypeptide encoded by SEQ ID NO: 1 1 87 


30 aa 


SEQIDNO: 1189 


MUCIR segment 12 


90nts 


SEQIDNO: 1190 


Polypeptide encoded by SEQ ID NO: 1 189 


30 aa 


SEQIDNO: 1191 


MUCIR segment 13 


90nts 


SEQIDNO: 1192 


Polypeptide encoded by SEQ ID NO: 1191 


30 aa 


SEQIDNO: 1193 


MUCIR segment 14 


90nts 


SEQIDNO: 1194 


Polypeptide encoded by SEQ ID NO: 1 193 


30 aa 


SEQIDNO: 1195 


MUCIR segment 15 


90nts 


SEQIDNO: 1196 


Polypeptide encoded by SEQ ID NO: 1195 


30 aa 


SEQIDNO: 1197 


MUCIR segment 16 


90nts 


SEQIDNO: 1198 


Polypeptide encoded by SEQ ID NO: 1 197 


30 aa 
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SEQUENCEID 
NUMBER 


SEQUENCE 


LENGTH 

• J, 


SEQIDNO: 1199 


MUCIR segment 17 


90nts 


SEQ ID NO: 1200 


Polypeptide encoded by SEQ ID NO: 1 199 


30 aa 


SEQ ID NO: 1201 


MUCIR segment 18 


90nts 


SEQ ID NO: 1202 


Polypeptide encoded by SEQ ID NO: 1201 


30 aa 


SEQ ID NO: 1203 


MUCIR segment 19 


90nts 


SEQ ID NO: 1204 


Polypeptide encoded by SEQ ID NO: 1203 


30 aa 


SEQ ID NO: 1205 


MUCIR segment 20 


90nts 


SEQIDNO: 1206 


Polypeptide encoded by SEQ ID NO: 1205 


30 aa 


SEQ ID NO: 1207 


MUCIR segment 21 


48nts 


SEQIDNO: 1208 


Polypeptide encoded by SEQ ID NO: 1207 


16 aa 


SEQ ID NO: 1209 


Differentiation Savine 


16638 nts 


SEQIDNO: 1210 


Polypeptide encoded by SEQ ID NO: 1209 


5546 aa 


SEQIDNO: 1211 


BAGE segment 1 


90 nts 


SEQ ID NO: 1212 


Polypeptide encoded by SEQ ID NO: 121 1 


30 aa 


SEQIDNO: 1213 


BAGE segment 2 


90 nts 


SEQ ID NO: 1214 


Polypeptide encoded by SEQ ID NO: 1213 


30 aa 


SEQIDNO: 1215 


BAGE segment 3 


51 nts 


SEQIDNO: 1216 


Polypeptide encoded by SEQ ID NO: 1215 


17 aa 


SEQ ID NO: 1217 


GAGE-1 segment 1 


90 nts 


SEQIDNO: 1218 


Polypeptide encoded by SEQ ID NO: 1217 


30 aa. 


SEQIDNO: 1219 


GAGE-1 segment 2 


90 nts 


SEQ ID NO: 1220 


Polypeptide encoded by SEQ ID NO: 1219 


30 aa 


SEQ ID NO: 1221 


GAGE-1 segment 3 


90 nts 


SEQ ID NO: 1222 


Polypeptide encoded by SEQ ID NO: 1221 


30 aa 
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' SEQUENCE ID 


SEQUENCE 


LENGTH 


number' 






SEQ ID NO: 1223 


GAGE-1 segment 4 


90nts 


SEQ ID NO: 1224 


Polypeptide encoded by SEQ ID NO: 1223 


30 aa 


SEQ ID NO: 1225 


GAGE-1 segment 5 


90nts 


SEQ ID NO: 1226 


Polypeptide encoded by SEQ ID NO: 1225 


30 aa 


SEQ ID NO: 1227 


GAGE-1 segment 6 


90nts 


SEQ ID NO: 1228 


Polypeptide encoded by SEQ ID NO: 1227 


30 aa 


SEQ ID NO: 1229 


GAGE-1 segment 7 


90nts 


SEQ ID NO: 1230 


Polypeptide encoded by SEQ ID NO: 1229 


30 aa 


SEQ ID NO: 1231 


GAGE-1 segment 8 


90nts 


SEQ ID NO: 1232 


Polypeptide encoded by SEQ ID NO: 1231 


30 aa 


SEQ ID NO: 1233 


GAGE-1 segment 9 


66 nts 


SEQ ID NO: 1234 


Polypeptide encoded by SEQ ID NO: 1233 


22 aa 


SEQ ID NO: 1235 


gpl001n4 segment 1 


90 nts 


SEQ ID NO: 1236 


Polypeptide encoded by SEQ ID NO: 1235 


30 aa 


SEQ ID NO: 1237 


gpl001n4 segment 2 


90 nts 


SEQ ID NO: 1238 


Polypeptide encoded by SEQ ID NO: 1237 


30 aa 


SEQ ID NO: 1239 


g)1001n4 segment 3 


75 nts 


SEQ ID NO: 1240 


Polypeptide encoded by SEQ ID NO: 1239 


25 aa 


SEQ ID NO: 1241 


MAGE-1 segment 1 


90 nts 


SEQ ID NO: 1242 


Polypeptide encoded by SEQ ID NO: 1241 


30 aa 


SEQ ID NO: 1243 


MAGE-1 segment 2 


90 nts 


SEQ ID NO: 1244 


Polypeptide encoded by SEQ ID NO: 1243 


30 aa 


SEQ ID NO: 1245 


MAGE-1 segment 3 


90 nts 


SEQ ID NO: 1246 


Polypeptide encoded by SEQ ID NO: 1245 


30 aa 
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SEQUENCEID 
' . '• ■ '' NUMBER 


SEQUENCE 


LENGTH 


SEQ ID NO: 1247 


MAGE-1 segment 4 


90nts 


SEQ ID NO: 1248 


Polypeptide encoded by SEQ ID NO: 1247 


30 aa 


SEQ ID NO: 1249 


MAGE-1 segment 5 


90nts 


SEQ ID NO: 1250 


Polypeptide encoded by SEQ ID NO: 1249 


30 aa 


SEQ ID NO: 1251 


MAGE-1 segment 6 


90nts 


SEQ ID NO: 1252 


Polypeptide encoded by SEQ ID NO: 1251 


30 aa 


SEQ ID NO: 1253 


MAGE-1 segment 7 


90nts 


SEQ ID NO: 1254 


Polypeptide encoded by SEQ ID NO: 1253 


30 aa 


SEQ ID NO: 1255 


MAGE-1 segment 8 


90nts 


SEQ ID NO: 1256 


Polypeptide encoded by SEQ ID NO: 1255 


30 aa 


SEQ ID NO: 1257 


MAGE-1 segment 9 


90nts 


SEQ ID NO: 1258 


Polypeptide encoded by SEQ ID NO: 1257 


30 aa 


SEQ ID NO: 1259 


MAGE-1 segment 10 


90nts 


SEQ ID NO: 1260 


Polypeptide encoded by SEQ ID NO: 1259 


30 aa 


SEQ ID NO: 1261 


MAGE-1 segment 11 


90nts 


SEQ ID NO: 1262 


Polypeptide encoded by SEQ ID NO: 1261 


30 aa 


SEQ ID NO: 1263 


MAGE-1 segment 12 


90nts 


SEQ ID NO: 1264 


Polypeptide encoded by SEQ ID NO: 1263 


30 aa 


SEQ ID NO: 1265 


MAGE-1 segment 13 


90nts 


SEQ ID NO: 1266 


Polypeptide encoded by SEQ ID NO: 1265 


30 aa 


SEQ ID NO: 1267 


MAGE-1 segment 14 


90nts 


SEQ ID NO: 1268 


Polypeptide encoded by SEQ ID NO: 1267 


30 aa 


SEQ ID NO: 1269 


MAGE-1 segment 15 


90nts 


SEQ ID NO: 1270 


Polypeptide encoded by SEQ ID NO: 1269 


30 aa 
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.SEQUENCE ID 
>. NVMBER 


- . ^ SEQUENCE . 


•, LElsfGTHr, 


SEQ ID NO: 1271 


MAGE-1 segment 16 


90nts 


SEQIDNO: 1272 


Polypeptide encoded by SEQ ID NO: 1271 


30 aa 


SEQ ID NO: 1273 


MAGE-1 segment 17 


90nts 


SEQ ID NO: 1274 


Polypeptide encoded by SEQ ID NO: 1273 


30 aa 


SEQ ID NO: 1275 


MAGE-1 segment 18 


90nts 


SEQ ID NO: 1276 


Polypeptide encoded by SEQ ID NO: 1275 


30 aa 


SEQ ID NO: 1277 


MAGE-1 segment 19 


90nts 


SEQ ID NO: 1278 


Polypeptide encoded by SEQ ID NO: 1277' 


30 aa 


SEQ ID NO: 1279 


MAGE-1 segment 20 


84nts 


SEQ ID NO: 1280 


Polypeptide encoded by SEQ ID NO: 1279 


28 aa 


SEQ ID NO: 1281 


MAGE-3 segment 1 


90nts 


SEQ ID NO: 1282 


Polypeptide encoded by SEQ ID NO: 1281 


30 aa 


SEQ ID NO: 1283 


MAGE-3 segment 2 


90nts 


SEQIDNO: 1284 


Polypeptide encoded by SEQ ID NO: 1283 


30 aa 


SEQ ID NO: 1285 


MAGE-3 segment 3 


90nts 


SEQ ID NO: 1286 


Polypeptide encoded by SEQ ID NO: 1285 


30 aa 


SEQIDNO: 1287 


MAGE-3 segment 4 


90nts 


SEQ ID NO: 1288 


Polypeptide encoded by SEQ ID NO: 1287 


30 aa 


SEQIDNO: 1289 


MAGE-3 segment 5 


90nts 


SEQ ID NO: 1290 


Polypeptide encoded by SEQ ID NO: 1289 


30 aa 


SEQIDNO: 1291 


MAGE-3 segment 6 


90nts 


SEQIDNO: 1292 


Polypeptide encoded by SEQ ID NO: 1291 


30 aa 


SEQ ID NO: 1293 


MAGE-3 segmait 7 


90nts 


SEQIDNO: 1294 


Polypeptide encoded by SEQ ID NO: 1293 


30 aa 
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SEQUENCE ID 
NUMBER V - , 


SEQUENCE 


. LENGTH 

.' ., 


SEQ ID NO: 1295 


MAGE-3 segment 8 


90nts 


SEQIDNO: 1296 


Polypeptide encoded by SEQ ID NO: 1295 


30 aa 


SEQ ID NO: 1297 


MAGE-3 segment 9 


90nts 


SEQ ID NO: 1298 


Polypeptide encoded by SEQ ID NO: 1297 


30 aa 


SEQ ID NO: 1299 


MAGE-3 segment 10 


90nts 


SEQIDNO: 1300 


Polypeptide encoded by SEQ ID NO: 1299 


30 aa 


SEQ ID NO: ISOl 


MAGE-3 segment 11 


90nts 


SEQ ID NO: 1302 


Polypeptide encoded by SEQ ID NO: 1301 


30 aa 


SEQ ID NO: 1303 


MAGE-3 segment 12 


90nts 


SEQ ID NO: 1304 


Polypeptide encoded by SEQ ID NO: 1303 


30 aa 


SEQ ID NO: 1305 


MAGE-3 segment 13 


90nts 


SEQ ID NO: 1306 


Polypeptide encoded by SEQ ID NO: 1305 


30 aa 


SEQ ID NO: 1307 


MAGE-3 segment 14 


90nts 


SEQ ID NO: 1308 


Polypeptide encoded by SEQ ID NO: 1307 


30 aa 


SEQ ID NO: 1309 


MAGE-3 segment 15 


90nts 


SEQ ID NO: 1310 


Polypeptide encoded by SEQ ID NO: 1309 


30 aa 


SEQIDNO: 1311 


MAGE-3 segment 16 


90nte 


SEQIDNO: 1312 


Polypeptide encoded by SEQ ID NO: 1311 


30 aa 


SEQIDNO: 1313 


MAGE-3 segment 17 


90nts 


SEQ ID NO: 1314 


Polypeptide encoded by SEQ ID NO: 1313 


30 aa 


SEQIDNO: 1315 


MAGE-3 segment 18 


90nts 


SEQIDNO: 1316 


Polypeptide encoded by SEQ ID NO: 1315 


30 aa 


SEQIDNO: 1317 


MAGE-3 segment 19 


90nts 


SEQIDNO: 1318 


Polypeptide encoded by SEQ ID NO: 1317 


30 aa 
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, SEQUENCE ID, . 
NUMBER 


SEQUENCE 

•r 


LENGTH 


SEQIDNO: 1319 


MAGE-3 segment 20 


90nts 


SEQ ID NO: 1320 


Polypeptide encoded by SEQ ID NO: 1319 


30 aa 


SEQ ID NO: 1321 


MAGE-3 segment 21 


54nts 


SEQ ID NO: 1322 


Polypeptide encoded by SEQ ID NO: 1321 


18 aa 


SEQ ID NO: 1323 


PRAME segment 1 


90nts 


SEQ ID NO: 1324 


Polypeptide encoded by SEQ ID NO: 1323 


30 aa 


SEQ ID NO: 1325 


PRAME segment 2 


90nts 


SEQ ID NO: 1326 


Polypeptide encoded by SEQ ID NO: 1325 


30 aa 


SEQIDNO: 1327 


PRAME segment 3 


90nts 


SEQ ID NO: 1328 


Polypeptide encoded by SEQ ID NO: 1327 


30 aa 


SEQ ID NO: 1329 


PRAME segment 4 


90nts 


SEQ ID NO: 1330 


Polypeptide encoded by SEQ ID NO: 1329 


30 aa 


SEQIDNO: 1331 


PRAME segment 5 


90nts 


SEQ ID NO: 1332 


Polypeptide encoded by SEQ ID NO: 1331 


30 aa 


SEQIDNO: 1333 


PRAME segment 6 


90nts 


SEQ ID NO: 1334 


Polypeptide encoded by SEQ ID NO: 1333 


30 aa 


SEQ ID NO: 1335 


PRAME segment 7 


90nts 


SEQ ID NO: 1336 


Polypeptide encoded by SEQ ID NO: 1335 


30 aa 


SEQIDNO: 1337 


PRAME segment 8 


90nts 


SEQ ID NO: 1338 


Polypeptide encoded by SEQ ID NO: 1337 


30 aa 


SEQ ID NO: 1339 


PRAME segment 9 


90nts 


SEQ ID NO: 1340 


Polypeptide encoded by SEQ ID NO: 1339 


30 aa 


SEQ ID NO: 1341 


PRAME segment 10 


90nts 


SEQ ID NO: 1342 


Polypeptide encoded by SEQ ID NO: 1341 


30 aa 
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SEQUENCEID 
NUMBER ■ 


SEQUENCE . '■ ';■ 


■/lENGTH : 


SEQIDNO: 1343 


FRAME segment 1 1 


90 nts 


SEQ ID NO: 1344 


Polypeptide encoded by SEQ ID NO: 1343 


30 aa 


SEQ ID NO: 1345 


FRAME segment 12 


90 nts 


SEQ ID NO: 1346 


Folyp^tide encoded by SEQ ID NO: 1345 


30 aa 


SEQ ID NO: 1347 


FRAME segment 13 


90 nts 


SEQ ID NO: 1348 


Polypeptide encoded by SEQ ID NO: 1347 


30 aa 


SEQ ID NO: 1349 


FRAME segment 14 


90 nts 


SEQ ID NO: 1350 


Polypeptide encoded by SEQ ID NO: 1349 


30 aa 


SEQIDNO: 1351 


FRAME segment 15 


90 nts 


SEQ ID NO: 1352 


Polypeptide encoded by SEQ ID NO: 1351 


30 aa 


SEQ ID NO: 1353 


FRAME segment 16. 


90 nts 


SEQIDNO: 1354 


Polypeptide encoded by SEQ ID NO: 1353 


30 aa 


SEQ ID NO: 1355 


FRAME segment 17 


90 nts 


SEQ ID NO: 1356 


Polypeptide encoded by SEQ ID NO: 1355 


30 aa 


SEQ ID NO: 1357 


FRAME segment 18 


90 nts 


SEQ ID NO: 1358 


Polypeptide encoded by SEQ ID NO: 1357 


30 aa 


SEQIDNO: 1359 


FRAME segment 19 


90 nts 


SEQIDNO: 1360 


Polypeptide encoded by SEQ ID NO: 1359 


30 aa 


SEQ ID NO: 1361 


FRAME segment 20 


90 nts 


SEQ ID NO: 1362 


Polypeptide encoded by SEQ ID NO: 1361 


30 aa 


SEQIDNO: 1363 


FRAME segment 21 


90 nts 


SEQ ID NO: 1364 


Polypeptide encoded by SEQ ID NO: 1363 


30 aa 


SEQIDNO: 1365 


FRAME segment 22 


90 nts 


SEQIDNO: 1366 


Polypeptide encoded by SEQ ID NO: 1365 


30 aa 
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. SEQUENCE ID 
NUMBER 


; " ■ SEQUENCE . ' ■■ f., .' 


LENGTH . 


SEQ ID NO: 1367 


FRAME segment 23 


90nts 


SEQIDNO: 1368 


Polypeptide encoded by SEQ ID NO: 1367 


30 aa 


SEQ ID NO: 1369 


FRAME segment 24 


90nts 


SEQ ID NO: 1370 


Polypeptide encoded by SEQ ID NO: 1369 


30 aa 


SEQ ID NO: 1371 


FRAME segment 25 


90nts 


SEQ ID NO: 1372 


Polypeptide encoded by SEQ ID NO: 1371 


30 aa 


SEQ ID NO: 1373 


FRAME segment 26 


90nts 


SEQ ID NO: 1374 


Polypeptide encoded by SEQ ID NO: 1373 


30 aa 


SEQ ID NO: 1375 


FRAME segment 27 


90nts 


SEQ ID NO: 1376 


Polypeptide encoded by SEQ ID NO: 1375 


30 aa 


SEQ ID NO: 1377 


FRAME segment 28 


90nts 


SEQ ID NO: 1378 


Polypeptide encoded by SEQ ID NO: 1377 


30 aa 


SEQ ID NO: 1379 


FRAME segment 29 


90nts 


SEQ ID NO: 1380 


Polypeptide encoded by SEQ ID NO: 1379 


30 aa 


SEQ ID NO: 1381 


FRAME segment 30 


90nts 


SEQ ID NO: 1382 


Polypeptide encoded by SEQ ID NO: 1381 


30 aa 


SEQ ID NO: 1383 


FRAME segment 31 


90nts 


SEQ ID NO: 1384 


Polypeptide encoded by SEQ ID NO: 1383 


30 aa 


SEQ ID NO: 1385 


FRAME segment 32 


90 nts 


SEQIDNO: 1386 


Polypeptide encoded by SEQ ID NO: 1385 


30 aa 


SEQ ID NO: 1387 


FRAME segment 33 


90 nts 


SEQ ID NO: 1388 


Polypeptide encoded by SEQ ID NO: 1387 


30 aa 


SEQIDNO: 1389 


FRAME segment 34 


54 nts 


SEQIDNO: 1390 


Polypeptide encoded by SEQ ID NO: 1389 


18 aa 
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SEQUENCE ID 
-NUMBER - 


SEQUENCE ' , " 


LEN<GTIi 


SEQ ID NO: 1391 


TRP21N2 segment 1 


90 nts 


SEQIDNO: 1392 


Polypeptide encoded by SEQ ID NO: 1391 


30 aa 


SEQ ID NO: 1393 


TRP21N2 segment 2 


90 nts 


SEQIDNO: 1394 


Polypeptide encoded by SEQ ID NO: 1393 


30 aa 


SEQ ID NO: 1395 


TRP21N2 segment 3 


84 nts 


SEQIDNO: 1396 


Polypeptide encoded by SEQ ID NO: 1395 


28 aa 


SEQIDNO: 1397 


NYNSOla segment 1 


90 nts 


SEQ ID NO: 1398 


Polypeptide encoded by SEQ ID NO: 1397 


30 aa 


SEQIDNO: 1399 


NYNSOla segment 2 


90 nts 


SEQ ID NO: 1400 


Polypeptide encoded by SEQ ID NO: 1399 


30 aa 


SEQIDNO: 1401 


NYNSOla segment 3 


90 nts 


SEQ ID NO: 1402 


Polypeptide encoded by SEQ ID NO: 1401 


30 aa 


SEQ ID NO: 1403 


NYNSOla segment 4 


90 nts 


SEQ ID NO: 1404 


Polypeptide encoded by SEQ ID NO: 1403 


30 aa 


SEQ ID NO: 1405 


NYNSOla segment 5 


90 nts 


SEQIDNO: 1406 


Polypeptide encoded by SEQ ID NO: 1405 


30 aa 


SEQ ID NO: 1407 


NYNSOla segment 6 


90 nts 


SEQ ID NO: 1408 


Polypeptide encoded by SEQ ID NO: 1407 


30 aa 


SEQ ID NO: 1409 


NYNSOla segment 7 


90 nts 


SEQIDNO: 1410 


Polypeptide encoded by SEQ ID NO: 1409 


30 aa 


SEQIDNO: 1411 


NYNSOla segment 8 


90 nts 


SEQ ID NO: 1412 


Polypeptide encoded by SEQ ID NO: 141 1 


30 aa 


SEQ ID NO: 1413 


NYNSOla segment 9 


90 nts 


SEQ ID NO: 1414 


Polypeptide encoded by SEQ ID NO: 1413 


30 aa 
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SEQUENCE ID 
NUMBER 


, ' . SEQUENCE /\ 


LENGTH 


SEQIDNO: 1415 


NYNSOla segment 10 


90nts 


SEQIDNO: 1416 


Polypeptide encoded by SEQ ID NO: 1415 


30 aa 


SEQ ID NO: 1417 


NYNSOla segment 1 1 


90nts 


SEQIDNO: 1418 


Polypeptide encoded by SEQ ID NO: 1417 


30 aa 


SEQ ID NO: 1419 


NYNSOla segment 12 


57nts 


SEQ ID NO: 1420 


Polypeptide encoded by SEQ ID NO: 1419 


19 aa 


SEQ ID NO: 1421 


NYNSOlb segment 1 


90nts 


SEQIDNO: 1422 


Polypeptide encoded by SEQ ID NO: 1421 


30 aa 


SEQ ID NO: 1423 


NYNSOlb segment 2 


90nts 


SEQ ID NO: 1424 


Polypeptide encoded by SEQ ID NO: 1423 


30 aa 


SEQIDNO: 1425 


NYNSOlb segment 3 


90nts 


SEQ ID NO: 1426 


Polypeptide encoded by SEQ ID NO: 1425 


30 aa 


SEQ ID NO: 1427 


NYNSOlb segment 4 


51 nts 


SEQ ID NO: 1428 


Polypeptide encoded by SEQ ID NO: 1427 




SEQ ID NO: 1429 


LAGEl segment 1 


90nts 


SEQ ID NO: 1430 


Polypeptide encoded by SEQ ID NO: 1429 


30 aa 


SEQIDNO: 1431 


LAGEl segment 2 


90 nts 


SEQ ID NO: 1432 


Polypeptide encoded by SEQ ID NO: 1431 


30 aa 


SEQ ID NO: 1433 


LAGEl segment 3 


90 nts 


SEQIDNO: 1434 


Polypeptide encoded by SEQ ID NO: 1433 


30 aa 


SEQ ID NO: 1435 


LAGEl segment 4 


90 nts 


SEQ ID NO: 1436 


Polypeptide encoded by SEQ ID NO: 1435 


30 aa 


SEQ ID NO: 1437 


LAGEl segment 5 


90 nts 


SEQ ID NO: 1438 


Polypeptide encoded by SEQ ID NO: 1437 


30 aa 
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SEQUENCER) 
• ' NimBER . 


SEQUENCE 


■ LENGTH 


SEQIDNO: 1439 


LAGEl segment 6 


90nts 


SEQIDNO: 1440 


Polypeptide encoded by SEQ ID NO: 1439 


30 aa 


SEQ ID NO: 1441 


LAGEl segment 7 


90nts 


SEQ ID NO: 1442 


Polypeptide encoded by SEQ ID NO: 1441 


30 aa 


SEQ ID NO: 1443 


LAGEl segment 8 


90nts 


SEQ ID NO: 1444 


Polypeptide encoded by SEQ ID NO: 1443 


30 aa 


SEQ ID NO: 1445 


LAGEl segment 9 


90nts 


SEQ ID NO: 1446 


Polypeptide encoded by SEQ ID NO: 1445 


30 aa 


SEQ ID NO: 1447 


LAGEl segment 10 


90nts 


SEQ ID NO: 1448 


Polypeptide encoded by SEQ ID NO: 1447 


30 aa 


SEQ ID NO: 1449 


LAGEl segment 11 


90nts 


SEQIDNO: 1450 


Polypeptide encoded by SEQ ID NO: 1449 


30 aa 


SEQIDNO: 1451 


LAGEl segment 12 


57nts 


SEQIDNO: 1452 


Polypeptide encoded by SEQ ID NO: 1451 


19 aa 


SEQ ID NO: 1453 


Melanoma cancer specific Savine 


10623 nts 


SEQ ID NO: 1454 


Polypeptide encoded by SEQ ID NO: 1453 


3541 aa 


SEQ ID NO: 1455 


Figure 16 AlSl 99mer 


99 nts 


SEQIDNO: 1456 


Figure 16 A1S2 lOOmer 


100 nts 


SEQ ID NO: 1457 


Figure 16 A1S3 lOOmer 


100 nts 


SEQIDNO: 1458 


Figure 16 A1S4 lOOmer 


100 nts 


SEQ ID NO: 1459 


Figure 16 A1S5 lOOmer 


100 nts 


SEQ ID NO: 1460 


Figure 16 A1S6 99mer 


99 nts 


SEQ ID NO: 1461 


Figure 16 A1S7 97mer 


99 nts 


SEQ ID NO: 1462 


Figure 16 A1S8 lOOmer 


100 nts 



wo 01/90197 



PCT/AUOl/00622 



-78- 



SEQUENCE ID 
NimmER . 


SEQUENCE. 


' , LENGTH , 


SEQ ID NO: 1463 


Figure 16 A1S9 lOOmer 


100 nts 


SEQ ID NO: 1464 


Figure 16 AISIO 75iner 


76nts 


SEQ ID NO: 1465 


Figure 16 AlF20mer 


20 nts 


SEQ ID NO: 1466 


Figure 16 AlR20iner 


20 nts 


SEQ ID NO: 1467 


Amino acid sequence of immunostimulatory 
domain of an invasin protein from Yersinia spp. 


16 aa 
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DETAILED DESCRIPTION OF THE INVENTION 
L Definitions 

The articles "a " and "an " are used herein to refer to one or to more than one (f.e., 
to at least one) of the grammatical object of the article. By way of example, "an element" 
5 means one element or more than one element 

As used herein, the term ''about'* refers to a quantity, level, value, dimension, 
size, or amount that varies by as much as 30%, preferably by as much as 20%, and more 
preferably by as much as 10% to a reference quantity, level, value, dimension, size, or 
amount. 

10 By "antigen-binding molecule " is meant a molecule that has binding affinity for a 

target antigen. It will be xmderstood that this term extends to immxmoglobulins, 
immxmoglobixlin jfragments and non-immunoglobulin derived protein frameworks that 
exhibit antigen-binding activity. 

The term "clade " as used herein refers to a hypothetical species of an organism 
15 and its descendants or a monophyletic group of organisms. Clades carry a definition, based 
on ancestry, and a diagnosis, based on synapomorphies. It should be noted that diagnoses 
of clades could change while defirdtions do not. 

Throughout this specification, unless the context requires otherwise, the words 
"comprise", "comprises and "comprising ' will be understood to imply the inclusion of a 
20 stated step or element or group of steps or elements but not the exclusion of any other step 
or element or group of steps or elements. 

By "expression vector'' is meant any autonomous genetic element capable of 
directing the synthesis of a protein encoded by the vector. Such expression vectors are 
known by practitioners in the art. 

25 As used herein, the term "function" refers to a biological, enzymatic, or 

therapeutic fimction. 
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"Homology" refers to the percentage number of amino acids that are identical or 
constitute conservative substitutions as defined in Table B infra. Homology may be 
determined using sequence comparison programs such as GAP (Deveraux et aL 1984, 
Nucleic Acids Research 12, 387-395). In this way, sequences of a similar or substantially 
5 different length to those cited herein might be compared by insertion of gaps into the 
alignment, such gaps being determined, for example, by the comparison algorithm used by 
GAP. 

To enhance an immune response ("immunoenhancement'% as is well-known in 
the art, means to increase an animal's capacity to respond to foreign or disease-specific 

10 antigens (e.g., cancer antigens) i,e., those cells primed to attack such antigens are increased 
in number, activity, and ability to detect and destroy the those antigens. Strength of 
immime response is measured by standard tests including: direct measurement of 
peripheral blood lymphocytes by means known to the art; natural killer cell cytotoxicity 
assays (see, e.g.^ Provinciali M. et al (1992, J. Immunol Meth. 155: 19-24), cell 

15 proliferation assays (see, e.g., VoUenweider, I. and Groseurth, P. J. (1992, J. Immunol 
Meth. 149: 133-135), immunoassays of inmixme cells and subsets (see, e,g.^ Loeffler, D. 
A., et aL (1992, Cytom. 13: 169-174); Rivoltini, L., et al (1992, Can. Immunol 
Immunother. 34: 241-251); or skin tests for cell-mediated immunity (see, e.g., Chang, A. 
E. et al (1993, Cancer Res, 53: 1043-1050). Any statistically significant increase in 

20 strength of immune response as measured by the foregoing tests is considered ''enhanced 
immune response" "immunoenhancement" or "immunopotentiation" as used herein. 
Enhanced immtme response is also indicated by physical manifestations such as fever and 
inflammation, as well as healing of systemic and local infections, and reduction of 
symptoms in disease, i.e., decrease in tumour size, alleviation of symptoms of a disease or 

25 condition including, but not restricted to, leprosy, tuberculosis, malaria, naphthous ulcers, 
herpetic and papillomatous warts, gingivitis, artherosclerosis, the concomitants of AIDS 
such as Kaposi's sarcoma, bronchial infections, and the like. Such physical manifestations 
also define "enhanced immune response" "immunoenhancement" or 
"immunopotentiation " as used herein. 

30 Reference herein to ''immuno-interactive'' includes reference to any interaction, 

reaction, or other form of association between molecules and in particular where one of the 
molecules is, or mimics, a component of the immune system. 
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By "isolated*' is meant material that is substantially or essentially firee from 
components that normally accompany it in its native state. 

By "modulating'' is meant increasing or decreasing, either directly or indirectly, 
an immune response against a target antigen of a member selected from the group 
5 consisting of a cancer and an organism, preferably a pathogenic organism. 

By "natural gene " is meant a gene that naturally encodes a protein. 

The term "natural polypeptide " as used herein refers to a polypeptide that exists 
in nature. 

By "obtained from " is meant that a sample such as, for example, a polynucleotide 
10 extract or polypeptide extract is isolated from, or derived from, a particular source of the 
host. For example, the extract can be obtained from a tissue or a biological fluid isolated 
directly from the host. 

The term ^'oligonucleotide^'' as used herein refers to a polymer composed of a 
multiplicity of nucleotide residues (deoxi^ibonucleotides or ribonucleotides, or related 

15 structural variants or synthetic analogues thereo:Q linked via phosphodiester bonds (or 
related structural variants or synthetic analogues thereof). Thus, while the term 
"oligonucleotide" typically refers to a nucleotide polymer in which the nucleotide residues 
and linkages between them are naturally occurring, it will be understood that the term also 
includes within its scope various analogues including, but not restricted to, peptide nucleic 

20 acids (PNAs), phosphoramidates, phosphorothioates, methyl phosphonates, 2-O-methyl 
ribonucleic acids, and the like. The exact size of flie molecule can vary depending on the 
particular application. An oligonucleotide is typically rather short in length, generally from 
about 10 to 30 nucleotide residues, but the term can refer to molecules of any length, 
although the term "polynucleotide" or "nucleic acid" is typically used for large 

25 oligonucleotides. 

By "operably linked" is meant that transcriptional and translational regulatory 
polynucleotides are positioned relative to a polypeptide-encoding polynucleotide in such a 
manner that the polynucleotide is transcribed and the polypeptide is translated. 
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The terni "parent polypeptide'' as used herein typically refers to a polypeptide 
encoded by a natural gene. However, it is possible that the parent polypeptide corresponds 
to a protein that is not naturally-occurring but has been engineered using recombinant 
techniques. In this instance, a pol3/nucleotide encoding the parent polypeptide may 
5 comprise different but synonymous codons relative to a natural gene encoding the same 
polypeptide. Altematively, the parent polypeptide may not correspond to a natural 
polypeptide sequence. For example, the parent polypeptide may comprise one or more 
consensus sequences common to a plurality of polypeptides. 

The term ''patienf refers to patients of human or other mammal and includes any 
10 individual it is desired to examine or treat using the methods of the invention. However, it 
will be understood that '"patienf^ does not imply that symptoms are present. Suitable 
mammals that fall within the scope of the invention include, but are not restricted to, 
primates, hvestock animals (e.g-., sheep, cows, horses, donkeys, pigs), laboratory test 
animals (e,g.^ rabbits, mice, rats, guinea pigs, hamsters), companion animals {e.g., cats, 
15 dogs) and captive wild animals {e,g., foxes, deer, dingoes). 

By ''pharmaceutically-acceptable carrier" is meant a solid or liquid filler, diluent 
or encapsulating substance that can be safely used in topical or systemic administration to a 
mammal. 

^'Polypeptide''\ ''peptide''' and "protein^'' are used interchangeably herein to refer to 
20 a polymer of amino acid residues and to variants and synthetic analogues of the same. 
Thus, these terms apply to amino acid polymers in which one or more amino acid residues 
is a synthetic non-naturally occurring amino acid, such as a chemical analogue of a 
corresponding naturally occurring amino acid, as well as to naturally-occurring amino acid 
polymers. 

25 The term ''polynucleotide^'' or "nucleic acid"" as used herein designates mRNA, 

RNA, cRNA, cDNA or DNA. The term typically refers to oligonucleotides greater than 30 
nucleotide residues in length. 

By "primer'^ is meant an oligonucleotide which, when paired with a strand of 
DNA, is capable of initiating the synthesis of a primer extension product in the presence of 
30 a suitable polymerising agent. The primer is preferably single-stranded for maximum 



wo 01/90197 



PCT/AUOl/00622 



-83- 

efficiency in amplification but can alternatively be double-stranded. A primer must be 
sufficiently long to prime the synthesis of extension products in the presence of the 
polymerisation agent. The length of the primer depends on many factors, including 
application, temperature to be employed, template reaction conditions, other reagents, and 
-5 source of primers. For example, depending on the complexity of the target sequence, the 
oligonucleotide primer typically contains 15 to 35 or more nucleotide residues, although it 
can contain fewer nucleotide residues. Primers can be large polynucleotides, such as from 
about 35 nucleotides to several kilobases or more. Primers can be selected to be 
"substantially complementary" to the sequence on the template to which it is designed to 

10 hybridise and serve as a site for the initiation of synthesis. By "substantially 
complementary^', it is meant that the primer is sufficiently complementary to hybridise 
with a target polynucleotide. Preferably, the primer contains no mismatches with the 
template to which it is designed to hybridise but this is not essential. For example, non- 
complementary nucleotide residues can be attached to the 5* end of the primer, with the 

15 remainder of the primer sequence being complementary to the template. Alternatively, 
non-complementary nucleotide residues or a stretch of non-complementary nucleotide 
residues can be interspersed into a primer, provided that the primer sequence has sufficient 
complementarity with the sequence of the template to hybridise therewith and thereby form 
a template for synthesis of the extension product of the primer. 

20 Probe" refers to a molecule that binds to a specific sequence or sub-sequence or 

other moiety of another molecule. Unless otherwise indicated, the term "probe" typically 
refers to a polynucleotide probe that binds to another polynucleotide, often called the 
"target polynucleotide", through complementary base pairing. Probes can bind target 
polynucleotides lacking complete sequence complementarity with the probe, depending on 

25 the stringency of the hybridisation conditions. Probes can be labelled directly or indirectly. 

By "recombinant polypeptide'' is meant a polypeptide made using recombinant 
techniques, z.e., through the expression of a recombinant or synthetic polynucleotide. 

Terms used to describe sequence relationships between two or more 
polynucleotides or polypeptides include "reference sequence", "comparison window", 
30 "sequence identity", "percentage of sequence identity" and "substantial identity". A 
^^reference sequence"' is at least 12 but frequently 15 to 18 and often at least 25 monomer 
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imits, inclusive of nucleotides and amino acid residues, in length. Because two 
polynucleotides may each comprise (1) a sequence (i.e., only a portion of the complete 
polynucleotide sequence) that is similar between the two polynucleotides, and (2) a 
sequence that is divergent between the two polynucleotides, sequence comparisons 
5 between two (or more) polynucleotides are typically performed by comparing sequences of 
the two polynucleotides over a "comparison window" to identify and compare local 
regions of sequence similarity. A "comparison window'' refers to a conceptual segment of 
at least 50 contiguous positions, usually about 50 to about 100, more usually about 100 to 
about 150 in which a sequence is compared to a reference sequence of the same number of 

10 contiguous positions after the two sequences are optimally aligned. The comparison 
window may comprise additions or deletions gaps) of about 20% or less as compared 
to the reference sequence (which does not comprise additions or deletions) for optimal 
alignment of the two sequences. Optimal alignment of sequences for aligning a comparison 
window may be conducted by computerised implementations of algorithms (GAP, 

15 BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Release 
7.0, Genetics Computer Group, 575 Science Drive Madison, WI, USA) or by inspection 
and the best ahgnment (i.e,, resulting in the highest percentage homology over the 
comparison window) generated by any of the various methods selected. Reference also 
may be made to the BLAST family of programs as for example disclosed by Altschul et 

20 al,, 1997, Nucl Acids Res, 25:3389. A detailed discussion of sequence analysis can be 
found in Unit 19.3 of Ausubel et aL, "Current Protocols in Molecular Biology", John 
Wiley & Sons hic, 1994-1998, Chapter 15. 

The term ''sequence identity" as used herein refers to the extent that sequences 
are identical on a nucleotide-by-nucleotide basis or an amino acid~by-amino acid basis 

25 over a window of comparison. Thus, a "percentage of sequence identity" is calculated by 
comparing two optimally aligned sequences over the window of comparison, determining 
the number of positions at which the identical nucleic acid base (e.g-.. A, T, C, G, I) or the 
identical amino acid residue {e.g., Ala, Pro, Ser, Thr, Gly, Val, Leu, He, Phe, Tyr, Trp, Lys, 
Arg, His, Asp, Glu, Asn, Gin, Cys and Met) occurs in both sequences to yield the number 

30 of matched positions, dividing the number of matched positions by the total number of 
positions in the window of comparison {i.e., the window size), and multiplying the result 
by 100 to yield the percentage of sequence identity. For the purposes of the present 
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invention, "sequence identity" will be understood to mean the "match percentage" 
calculated by the DNASIS computer program (Version 2.5 for windows; available from 
Hitachi Software engineering Co., Ltd., South San Francisco, California, USA) using 
standard defaults as used in the reference manual accompanying the software. 

5 The term "synthetic polynucleotide" as used herein refers to a polynucleotide 

formed in vitro by the manipulation of a polynucleotide into a form not normally found in 
nature. For example, the synthetic polynucleotide can be in the form of an expression 
vector. Generally, such expression vectors include transcriptional aad translational 
regulatory polynucleotide operably linked to the polynucleotide. 

10 The terra synonymous codon " as used herein refers to a codon having a different 

nucleotide sequence than anotlier codon but encoding the same amino acid as that other 
codon* 

By "translational efficiency" is meant the efficiency of a celFs protein synthesis 
machinery to incorporate the amino acid encoded by a codon into a nascent polypeptide 
15 chain. This ejSBciency can be evidenced, for example, by the rate at which the cell is able to 
synthesise the polypeptide from an RNA template comprising the codon, or by the amount 
of the polypeptide synthesised from such a template. 

By "vector" is meant a polynucleotide molecule, preferably a DNA molecule 
derived, for example, from a plasmid, bacteriophage, yeast or virus, into which a 

20 polynucleotide can be inserted or cloned. A vector preferably contains one or more tmique 
restriction sites and can be capable of autonomous replication in a defined host cell 
including a target cell or tissue or a progenitor cell or tissue thereof, or be integrable with 
the genome of the defined host such that the cloned sequence is reproducible. Accordingly, 
the vector can be an autonomously replicating vector, z.e., a vector that exists as an 

25 extrachromosomal entity, the replication of which is independent of chromosomal 
replication, e.g*., a hnear or closed circular plasmid, an extrachromosomal element, a 
minichromosome, or an artificial chromosome. The vector can contain any means for 
assuring self-replication. Altematively, the vector can be one which, when introduced into 
the host cell, is integrated into the genome and replicated together with the chromosome(s) 

30 into which it has been integrated. A vector system can comprise a single vector or plasmid, 
two or more vectors or plasmids, which together contain the total DNA to be introduced 
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into the genome of the host cell, or a transposon. The choice of the vector will typically 
depend on the compatibility of the vector with the host cell into which the vector is to be 
introduced. In the present case, the vector is preferably a viral or viral-derived vector, 
which is operably functional in animal and preferably mammalian cells. Such vector may 
5 be derived from a poxvirus, an adenovirus or yeast. The vector can also include a selection 
marker such as an antibiotic resistance gene that can be used for selection of suitable 
transformants. Examples of such resistance genes are known to those of skill in the art and 
include the nptll gene that confers resistance to the antibiotics kanamycin and G418 
(Geneticin®) and the hph gene which confers resistance to the antibiotic hygromycin B. 
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2. Synthetic polypeptides 

The inventors have surprisingly discovered that the structure of a parent 
polypeptide can be disrupted sufficiently to impede, abrogate or otherwise alter at least one 
function of the parent polypeptide, while simultaneously minimising the destruction of 
5 potentially useful epitopes that are present in the parent polypeptide, by fusing, coupling or 
otherwise linking together different segments of the parent polypeptide in a different 
relationship relative to their linkage in the parent polypeptide. As a result of this change in 
relationship, the sequence of the linked segments in the resulting synthetic polypeptide is 
different to a sequence contained within the parent polypeptide. The synthetic polypeptides 
10 of the invention are useful as immunopotentiating agents, aud are referred to elsewhere in 
the specification as scrambled antigen vaccines, super attenuated vaccines or "Savines '\ 

Thus, the invention broadly resides in a synthetic polypeptide comprising a 
plurahty of different segments of at least one parent polypeptide, wherein said segments 
are linked together in a different relationship relative to their linkage in the at least one 
1 5 parent polypeptide. 

It is preferable but not essential that the segments in said synthetic polypeptide are 
linked sequentially in a different order or arrangement relative to that of corresponding 
segments in said at least one parent polypeptide. For example, in the case of a parent 
polypeptide that comprises three contiguous or overlapping segments A-B-C-D, these 
segments may be linked in 23 other possible orders to form a synthetic polypeptide. These 
orders may be selected fi-om the group consisting of: A-B-D-C, A-C-B-D, A-C-D-B, A-D- 
B-C, A-D-C-B, B-A-C-D, B-A-D-C, B-C-A-D, B-C-D-A, B-D-A-C, B-D-C-A, C-A-B-D, 
C-A-D-B, C-B-A-D, C-B-D-A, C-D-A-B, C-D-B-A, D-A-B-C, D-A-C-B, D-B-A-C, D-B- 
C-A, D-C-A-B, and D-C-B-A. Although the rearrangement of the segments is preferably 
random, it is especially preferable to exclude or otherwise minimise rearrangements that 
result in complete or partial reassembly of the parent sequence (e.g.^ ADBC, BACD, 
DABC). It will be appreciated, however, that the probability of such complete or partial 
reassembly diminishes as the number of segments for rearrangement increases. 

The order of the segments is suitably shuffled, reordered or otherwise rearranged 
relative to the order in which they exist in the parent polypeptide so that the structure of the 
polypeptide is disrupted sufficiently to impede, abrogate or otherwise alter at least one 
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function associated with the parent polypeptide. Preferably, the segments of the parent 
polypeptide are randomly rearranged in tlie synthetic polypeptide. 

The parent polypeptide is suitably a polypeptide that is associated with a disease 
or condition. For example, the parent polypeptide may be a polypeptide expressed by a 
5 pathogenic organism or a cancer. Alternatively, the parent polypeptide can be a self 
peptide related to an autoimmune disease including, but are not limited to, diseases such as 
diabetes {e,g., juvenile diabetes), multiple sclerosis, rheumatoid arthritis, myasthenia 
gravis, atopic dermatitis, and psoriasis and ankylosing spondylitis. Accordingly, the 
synthetic molecules of the present invention may also have utility for the induction of 

10 tolerance in a subject afflicted with an autoimmune disease or condition or with an allergy 
or other condition to which tolerance is desired. For example tolerance may be induced by 
contacting an immature dendritic cell of the individual to be treated with a synthetic 
polypeptide of the invention or by expressing in an immature dendritic cell a synthetic 
polynucleotide of the invention. Tolerance may also be induced against antigens causing 

15 allergic responses (e.g., asthma, hay fever). In this case, the parent polypeptide is suitably 
an allergenic protein including, but not restricted to, house-dust-mite allergenic proteins as 
for example described by Thomas and Smith (1998, Allergy, 53(9): 821-832). 

The pathogenic organism includes, but is not restricted to, yeast, a viras, a 
bacterium, and a parasite. Any natural host of the pathogenic organism is contemplated by 

20 the present invention and includes, but is not limited to, mammals, avians and fish. In a 
preferred embodiment, the pathogenic organism is a virus, which may be an RNA virus or 
a DNA virus. Preferably, the RNA vims is Human Immunodeficiency Vims (HIV), 
PoUovkus, and Influenza virus, Rous sarcoma virus, or a Flavivirus such as Japanese 
encephalitis virus. In a preferred embodiment, the RNA virus is a Hepatitis viras including, 

25 but not limited to. Hepatitis strains A, B and C. Suitably, the DNA virus is a Herpesviras 
including, but not limited to. Herpes simplex virus, Epstein-Barr virus. Cytomegalovirus 
and Parvovirus. In a preferred embodiment, the virus is HIV and the parent polypeptide is 
suitably selected from env, gag, pol, vif, vpr, tat, rev, vpu and nef, or combination thereof. 
In an altemate preferred embodiment, the virus is Hepatitis CI a virus and the parent 

30 polypeptide is the Hepatitis C 1 a viras polyprotein. 
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la another embodiment, the pathogenic organism is a bacteriiun, which includes, 
but is not restricted to. Neisseria species, Meningococcal species, Haemophilus species 
Salmonella species. Streptococcal species, Legionella species and Mycobacterium species. 

In yet another embodiment, the pathogenic organism is a parasite, which includes, 
5 but is not restricted to, Plasmodium species. Schistosoma species, Leishmania species, 
Trypanosoma species. Toxoplasma species and Giardia species. 

Any cancer or tumour is contemplated by the present invention. For example, the 
cancer or tumour includes, but is not restricted to, melanoma, Ixmg cancer, breast cancer, 
cervical cancer, prostate cancer, colon cancer, pancreatic cancer, stomach cancer, bladder 

10 cancer, kidney cancer, post transplant lymphoproliferative disease (PTLD), Hodgkin's 
Lymphoma and the like. Preferably, the cancer or tumour relates to melanoma. In a 
preferred embodiment of this type, the parent polypeptide is a melanocyte differentiation 
antigen which is suitably selected from gplOO, MART, TRP-1, Tyros, TRP2, MCIR, 
MUCIF, MUCIR or a combination thereof. In an alternate preferred embodiment of this 

15 type, the parent polypeptide is a melanoma-specific antigen which is suitably selected from 
BAGE, GAGE-1, gpl00In4, MAGE-1, MAGE-3, FRAME, TRP2IN2, NYNSOla, 
NYNSOlb, LAGEl or a combination thereof 

in a preferred embodiment, the segments are selected on the basis of size. A 
segment according to the invention may be of any suitable size that can be utilised to elicit 

20 an immune response against an antigen encoded by the parent polypeptide. A number of 
factors can influence the choice of segment size. For example, the size of a segment should 
be preferably chosen such that it includes, or corresponds to the size of, T cell epitopes and 
their processing requirement. Practitioners in the art will recognise that class I-restricted T 
cell epitopes can be between 8 and 10 amino acids in length and if placed next to unnatural 

25 flanking residues, such epitopes can generally require 2 to 3 natural flanking amino acids 
to ensure that they are efficiently processed and presented. Class Il-restricted T cell 
epitopes can range between 12 and 25 amino acids in length and may not require natural 
flanking residues for efficient proteolytic processing although it is believed that natural 
flanking residues may play a role. Another important feature of class Il-restricted epitopes 

30 is that they generally contain a core of 9-10 amino acids in the middle which bind 
specifically to class n MHC molecules with flanking sequences either side of this core 
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stabilising binding by associating with conserved structures on either side of class 11 MHC 
antigens in a sequence independent manner (Brown et al, 1993). Thus the functional 
region of class H-restricted epitopes is typically less than 15 amino acids long. The size of 
linear B cell epitopes and the factors effecting their processing, like class Il-restricted 
5 epitopes, are quite variable although such epitopes are frequently smaller in size than 15 
amino acids. From the foregoing, it is preferable, but not essential, that the size of the 
segment is at least 4 amino acids, preferably at least 7 amino acids, Inore preferably at least 
12 amino acids, more preferably at least 20 amino acids and more preferably at least 30 
amino acids. Suitably, the size of the segment is less than 2000 amino acids, more 

10 preferably less than 1000 amino acids, more preferably less than 500 amino acids, more 
preferably less than 200 amino acids, more preferably less than 100 amino acids, more 
preferably less than 80 amino acids and even more preferably less than 60 amino acids and 
still even more preferably less than 40 amino acids. In this regard, it is preferable that the 
size of the segments is as small as possible so that the synthetic polypeptide adopts a 

15 functionally different structure relative to the structure of the parent polypeptide. It is also 
preferable that the size of the segments is large enough to mdnimise loss of T cell epitopes. 
In an especially preferred embodiment, the size of the segment is about 30 amino acids. 

An optional spacer may be utilised to space adjacent segments relative to each 
other. Accordingly, an optional spacer may be interposed between some or all of the 

20 segments. The spacer suitably alters proteol3^ic processing and/or presentation of adjacent 
segment(s). In a preferred embodiment of this type, the spacer promotes or otherwise 
enhances proteolytic processing and/or presentation of adjacent segment(s). Preferably, the 
spacer comprises at least one amino acid. The at least one amino acid is suitably a neutral 
amino acid. The neutral amino acid is preferably alanine. Alternatively, the at least one 

25 amino acid is cysteine. 

In a preferred embodiment, segments are selected such that they have partial 
sequence identity or homology with one or more other segments. Suitably, at one or both 
ends of a respective segment there is comprised at least 4 contiguous amino acids, 
preferably at least 7 contiguous amino acids, more preferably at least 10 contiguous amino 
30 acids, more preferably at least 15 contiguous amino acids and even more preferably at least 
20 contiguous amino acids that are identical to, or homologous with, an amino acid 
sequence contained within one or more other of said segments. Preferably, at the or each 
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end of a respective segment there is comprised less than 500 contiguous amino acids, more 
preferably less than 200 contiguous amino acids, more preferably less than 100 contiguous 
amino acids, more preferably less than 50 contiguous amino acids, more preferably less 
than 40 contiguous amino acids, and even more preferably less than 30 contiguous amino 
5 acids that are identical to, or homologous with, an amino acid sequence contained within 
one or more other of said segments. Such sequence overlap (also referred to elsewhere in 
the specification as "overlapping fragments" or "overlapping segments") is preferable to 
ensure potential epitopes at segment boimdaries are not lost and to ensure that epitopes at 
or near segment boundaries are processed efficiently if placed beside or near amino acids 
10 that inhibit processing. Preferably, the segment size is about twice the size of the overlap. 

In a preferred embodiment, when segments have partial sequence homology 
therebetween, the homologous sequences suitably comprise conserved and/or non- 
conserved amino acid differences. Exemplary conservative substitutions are listed in the 
following table. 

15 TABLES 







Ala 


Ser 


Arg 


Lys 


Asn 


Gln,ffis 


Asp 


Glu 


Cys 


Ser 


Gin 


Asn 


Glu 


Asp 


Gly 


Pro 


His 


Asn, Gin 


ne 


Leu, Val 


Leu 


He, Val 
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Original Residue 


, Exemplary Substitutions 


Lys 


Arg, Gin, Glu 


Met 


Leu, He, 


Phe 


Met, Leu, Tyr 


Ser 


Thr 


Thr 


Ser 


Trp 


Tyr 


Tyr 


Trp, Phe 


Val 


He, Leu 



Conserved or non-conserved differences may correspond to polymorphisms in 
corresponding parent polypeptides. Polymorphic polypeptides are expressed by various 
pathogenic organisms and cancers. For example, the polymorphic polypeptides may be 
5 expressed by different viral strains or clades or by cancers in different individuals. 

Sequence overlap between respective segments is preferable to roinixnise 
destruction of any epitope sequences that may result from any shuffling or rearrangement 
of the segments relative to their existing order in the parent polypeptide. If overlapping 
segments as described above are employed to form a synthetic polypeptide, it may not be 

10 necessary to change the order in which those segments are linked together relative to the 
order in which corresponding segments are normally present in the parent polypeptide. In 
this regard, such overlapping segments when linked together in the synthetic polypeptide 
can adopt a different structure relative to the structure of the parent polypeptide, wherein 
the different structure does not provide for one or more functions associated with the 

15 parent polypeptide. For example, in the case of four segments A-B-C-D each spanning 30 
contiguous amino acids of the parent polypeptide and having a 10-amino acid overlapping 
sequence with one or more adjacent segments, the synthetic polypeptide will have 
duplicated 10-amino acid sequences bridging segments A-B, B-C and C-D. The presence 
of these duplicated sequences may be sufficient to render a different stmcture and to 

20 abrogate or alter function relative to the parent polypeptide. 
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In a preferred embodiment, segment size is about 30 amino acids and sequence 
overlap at one or both ends of a respective segment is about 15 amino acids. However, it 
will be understood that other suitable segment sizes and sequence overlap sizes are 
contemplated by the present invention, which can be readily ascertained by persons of skill 
5 in the art. 

It is preferable but not necessary to utilise all the segments of the parent 
polypeptide in the construction of the synthetic polypeptide. Suitably, at least 30%, 
preferably at least 40%, more preferably at least 50%, even more preferably at least 60%, 
even more preferably at least 70%, even more preferably at least 80% and still even more 

10 preferably at least 90% of the parent polypeptide sequence is used in the construction of 
the synthetic polypeptide. However, it will be understood that the more sequence 
infonnation from a parent polypeptide that is utilised to construct the synthetic 
polypeptide, the greater the population coverage will be of the synthetic polypeptide as an 
inmiunogen. Preferably, no sequence infonnation from the parent polypeptide is excluded 

15 ie.g.y because of an apparent lack of immunological epitopes). 

Persons of skill in the art will appreciate that when preparing a synthetic 
polypeptide against a pathogenic organism {e.g., a virus) or a cancer, it may be preferable 
to use sequence information from a pluraKty of different polypeptides expressed by the 
organism or the cancer. Accordingly, in a preferred embodiment, segments from a plurality 

20 of different polypeptides are linked together to form a synthetic polypeptide according to 
the invention. It is preferable in this respect to utilise as many parent polypeptides as 
possible from, or in relation to, a particular source in the construction of the synthetic 
polypeptide. The source of parent polypeptides includes, but is not limited to, a pathogenic 
organism and a cancer. Suitably, at least about 30%, preferably at least 40%, more 

25 preferably at least 50%, even more preferably at least 60%, even more preferably at least 
70%, even more preferably at least 80% and still even more preferably at least 90% of the 
parent polypeptides expressed by the source is used in the construction of the synthetic 
polypeptide. Preferably, parent polypeptides from a vims include, but are not restricted to, 
latent polypeptides, regulatory polypeptides or polypeptides expressed early during their 

30 replication cycle. Suitably, parent polypeptides from a parasite or bacteriimi include, but 
are not restricted to, secretory polypeptides and polypeptides e^ressed on the surface of 
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the parasite or bacteria. It is preferred that parent polypeptides from a cancer or tumour are 
cancer specific polypeptides. 

Suitably, hypervariable sequences within the parent polypeptide are excluded 
from the construction of the synthetic polypeptide, 

5 The synthetic polypeptides of tlie inventions may be prepared by any suitable 

procedure known to those of skill in the art. For example, the polypeptide may be 
synthesised using solution synthesis or solid phase synthesis as described, for example, in 
Chapter 9 of Atherton and Shephard (1989, Solid Phase Peptide Synthesis: A Practical 
Approach. IRL Press, Oxford) and in Roberge et al (1995, Science 269: 202). Syntheses 
10 may employ, for example, either ^butyloxycarbonyl (^Boc) or 9- 
fluorenyhnethyloxycarbonyl (Fmoc) chemistries (see Chapter 9.1, of Coligan et al, 
CURRENT PROTOCOLS IN PROTEIN SCIENCE, John Wiley & Sons, Inc. 1995-1997; 
Stewart and Young, 1984, Solid Phase Peptide Synthesis, 2nd ed. Pierce Chemical Co., 
Rockford, 111; and Atherton and Shephard, supra). 

15 AJtematively, the polypeptides may be prepared by a procedure including the 

steps of: 

(a) preparing a synthetic construct including a synthetic polynucleotide encoding 
a synthetic polypeptide wherein said synthetic polynucleotide is operably liiJced to a 
regulatory polynucleotide, wherein said synthetic polypeptide comprises a plurality of 

20 different segments of a parent polypeptide, wherein said segments are linked together 
in a different relationship relative to their linkage in the parent polypeptide; 

(b) introducing the synthetic construct into a suitable host cell; 

(c) culturing the host cell to express the synthetic polypeptide from said synthetic 
construct; and 

25 (d) isolating the synthetic polypeptide. 

The synthetic construct is preferably in the form of an expression vector. For 
example, the expression vector can be a self-replicating extra-chromosomal vector such as 
a plasmid, or a vector that integrates into a host genome. Typically, the regulatory 
polynucleotide may include, but is not limited to, promoter sequences, leader or signal 
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sequences, ribosomal binding sites, transcriptional start and stop sequences, translational 
start and termination sequences, and enhancer or activator sequences. Constitutive or 
inducible promoters as known in the art are contemplated by the invention. The promoters 

may be either naturally occurring promoters, or hybrid promoters that combine elements of 
5 more than one promoter. The regulatory polynucleotide will generally be appropriate for 
the host cell used for expression. Numerous types of appropriate expression vectors and 
suitable regulatory polynucleotides are known in the art for a variety of host cells. 

In a preferred embodiment, the expression vector contains a selectable marker 
gene to allow the selection of transfomied host cells. Selection genes are well known in the 
10 art and will vary with the host cell used. 

The expression vector may also include a fusion partner (typically provided by the 
expression vector) so that the synthetic polypeptide of the invention is expressed as a 
fusion polypeptide with said fusion partner. The main advantage of fusion partners is that 
they assist identification and/or purification of said fusion polypeptide. Tn order to express 
15 said fusion polypeptide, it is necessary to ligate a polynucleotide according to the invention 
into the expression vector so that the translational reading frames of the fusion partner and 
the polynucleotide coincide. 

Well known examples of fusion partners include, but are not limited to, 
glutathione-S-transferase (GST), Fc portion of human IgG, maltose binding protein (MBP) 

20 and hexahistidine (HISe), which are particularly useful for isolation of the fusion 
polypeptide by affinity chromatography. For the purposes of fusion polypeptide 
purification by affinity chromatography, relevant matrices for affinity chromatography are 
glutathione-, amylose-, and nickel- or cobalt-conjugated resins respectively. Many such 
matrices are available in 'Tcit" form, such as the QIAexpress*™ system (Qiagen) useful with 

25 (HISe) fusion partners and the Pharmacia GST purification system. In a preferred 
embodiment, the recombinant polynucleotide is expressed in the commercial vector 
pFLAG™. 

Another fusion partner well known in the art is green fluorescent protein (GFP). ' 
This fusion partner serves as a fluorescent "tag" which allows the fusion polypeptide of the 
30 invention to be identified by fluorescence microscopy or by flow cj^ometry. The GFP tag 
is useful when assessing subcellular localisation of a fusion polypeptide of the invention. 
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or for isolating cells which express a fusion polypeptide of the iavention. Flow cytometric 
methods such as fluorescence activated cell sorting (FACS) are particularly useful in this 
latter application. Preferably, the fusion partners also have protease cleavage sites, such as 
for Factor Xa, Thrombin and inteins (protein introns), which allow the relevant protease to 
5 partially digest the fusion polypeptide of the invention and thereby liberate the 
recombinant polypeptide of the invention therefrom. The liberated polypeptide can then be 
isolated from the fusion partner by subsequent chromatographic separation. Fusion 
partners according to the iavention also include within their scope "epitope tags", which 
are usually short peptide sequences for which a specific antibody is available. Well known 

10 examples of epitope tags for which specific monoclonal antibodies are readily available 
include c-Myc, influenza virus, haemagglutinin and FLAG tags. Altematively, a fusion 
partner may be provided to promote other forms of immunity. For example, the fusion 
partner may be an antigen-binding molecule that is unmuno-interactive with a 
conformational epitope on a target antigen or to a post-translational modification of a 

15 target antigen (e,g., an antigen-binding molecule that is inmiuno-interactive with a 
glycosylated target antigen). 

The step of introducing the synthetic construct into the host cell may be effected 
by any suitable method including transfection, and transformation, the choice of which will 
be dependent on the host cell employed. Such methods are well known to those of skill in 
20 ' the art. 

Synthetic polypeptides of the invention may be produced by culturing a host cell 
transformed with the synthetic construct. The conditions appropriate for protein expression 
will vary with the choice of expression vector and the host celL This is easily ascertained 
by one skilled in the art through routine experimentation. 

25 Suitable host cells for expression may be prokaryotic or eukaryotic. One preferred 

host cell for expression of a polypeptide according to the invention is a bacterium. The 
bacterium used may be Escherichia coli. Altematively, the host cell may be an insect cell 
such as, for example, SF9 cells that may be utilised with a baculoviras expression system. 

The synthetic polypeptide may be conveniently prepared by a person skilled in the 
30 art using standard protocols as for example described in Sambrook, et aL, MOLECULAR 
CLONING. A LABORATORY MANUAL (Cold Spring Harbor Press, 1989), in particular 
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Sections 16 and 17; Ausubel et al, CURRENT PROTOCOLS IN MOLECULAR 
BIOLOGY (John Wiley & Sons, Inc. 1994-1998), in particular Chapters 10 and 16; and 
Coligan et al, CURRENT PROTOCOLS IN PROTEIN SCIENCE (John Wiley & Sons, 
Inc. 1995-1997), in particular Chapters 1, 5 and 6. 

5 The amino acids of the synthetic polypeptide can be any non-naturally occurring 

or any naturally occurring amino acid. Examples of unnatural amino acids and derivatives 
during peptide synthesis include but are not limited to, use of 4-amino butyric acid, 6- 
aminohexanoic acid, 4-amino-3-hydroxy-5-phenylpentanoic acid, 4-amino-3-hydroxy-6- 
methylheptanoic acid, t-butylglycine, norleucine, norvaline, phenylglycine, ornithine, 
10 sarcosine, 2-thienyl alanine and/or D-isomers of amino acids. A list of unnatural amino 
acids contemplated by the present invention is shown in TABLE C. 



TABLE C 



ISfqn-cgnyentianal amino acid::' J 


Hon-coTtventional aniijio acid IJ :i,4 h ^ t. " . ^ 


a-aminobutyric acid 


L-N-methylalanine 


Q!-amino-oi-methylbutyrate 


L-N-methylarginine 


aminocyclopropane-carboxylate 


L-N~methylasparagine 


aminoisobutyric acid 


L-N-methylaspartic acid 


aminonorbomyl-carboxylate 


L-N-methylcysteine 


cyclohexylalanine 


L-N-methylglutamine 


cyclopentylalanine 


L-N-methylglutamic acid 


L-N-methylisoleucine 


L-N-methylhistidine 


D-alanine 


L-N-methylleucine 


D-arginine 


L-N-methyllysine 


D-aspartic acid 


L-N-methyhnethionine 


D-cysteine 


L-N-methylnorleucine 


D-glutamate 


L-N-methylnorvaline 


D-glutamic acid 


L-N-methylomithine 
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Non-conventional ammo acid 


Non-convefiUonal amino acid^ > . * . 


D-hdstidine 


L-N-methylphenylalanine 


D-isoleucine 


L-N-methylproline 


D-leucine 


L-N-medlylserine 


D-lysine 


L-N-methylthreoniiie 


D-methionine 


L-N-methyltryptophaa 


D-omithine 


L-N-methyltyrosine 


D-phenylalanine 


L-N-methylvaline 


D-proline 


L-N-methylethylglycine 


D-serine 


L-N-methyl-t-butylglycine 


D-threonine 


L-norleucine 


D-tryptophan 


L-norvaline 


D4yrosine 


a-methyl-aminoisobutyrate 


D-valine 


or-methyl-Y-aminobutyrate 


D-a-methylalanine 


a-methylcyclohexylalanine 


D-of-methylargirdne 


a-methylcylcopentylalanine 


D-Qf-methylasparagine 


c^methyl-ce-naplhylalaQine 


D-ce-methylaspartate 


o;-methylpenicillamine 


D-oi-methylcysteine 


N-(4-anainobutyl)glycme 


D-a-methylglutamine 


N-(2-aminoethyl)glycme 


D-o;-methylhistidine 


N-(3-atninopropyl)glycine 


D-ai-methylisoleucine 


N-amino-oi-methylbutyrate 


D-a-methylleucine 


oi-napthylalanine 


D-a-methyllysine 


N-benzylglycine 


D-Qi-metliylmethionine 


N-(2-carbamylediyl)glycme 


D-a-methylomithiine 


N-(carbaniyltnethyl)glycine 
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Non-conventional amino acid 


Non-conventional amino acid 


D-Q^methylphenylalanine 


N-(2-carboxyethyl)glycine 


D-Qi-methylproline 


N-(carboxynaLethyl)glycine 


D-a-methylserine 


N-cyclobutylglycine 


D-a-methylthreomne 


N-cycloheptylglycine 


D-oi-methyltryptophan 


N-cyclohexylglycine 


D-a-methyltyrosine 


N-cyclodecylglycine 


L-a:-methylleucine 


L-Of-methyllysine 


L-a-methylmethionine 


L-Qf-methylnorleucine 


L-ce-methylnorvatine 


L-a-methylomithine 


L-cx-methylphenylalanine 


L-a-methylproline 


L-omethylserine 


L-Qj-methylthreonine 


L-a-methyltryptophan 


L-a-methyltyrosine 


L-a-methylvaline 


L-N-methylhomophenylalanine 


N-(N-(2,2-diphenylethyl 
carbamylmethyl)glycine 


N-(N-(3 ,3 -diphenylpropyl 
carbainylinethyl)glycine 


1 -carboxy- 1 -(252-diphenyl-ethyl 
amino)cyclopropane 





The invention also contemplates modifying the synthetic polypeptides of tibie 
invention using ordinary molecular biological techniques so as to alter their resistance to 
proteolytic degradation or to optimise solubility properties or to render them more suitable 
5 as an immunogenic agent. 

3. Preparation of synthetic polynucleotides of the invention 

The invention contemplates synthetic polynucleotides encoding the synthetic 
polypeptides as for example described in Section 2 supra. Polynucleotides encoding 
segments of a parent polypeptide can be produced by any suitable technique. For example, 
10 such polynucleotides can be syathesised de novo using readily available machinery. 
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Sequential synthesis of DNA is described, for example, in U.S. Patent No 4,293,652. 
Instead of de novo synthesis, recombinant techniques may be employed including use of 
restriction endonucleases to cleave a polynucleotide encoding at least a segment of the 
parent polypeptide and use of ligases to ligate together in frame a plurality of cleaved 
5 polynucleotides encoding different segments of the parent polypeptide. Suitable 
recombinant techniques are described for example in the relevant sections of Ausubel, et 
ah {supra) and of Sambrook, et aL^ {supra) which are incorporated herein by reference. 
Preferably, the synthetic polynucleotide is constructed using splicing by overlapping 
extension (SOEing) as for example described by Horton et ah (1990, Biotechniques 8(5): 
10 528-535; 1995, Mol Biotechnol 3(2): 93-99; and 1997, Methods Mol Biol 67: 141-149). 
However, it should be noted that the present invention is not dependent on, and not 
directed to, any one particular technique for constructing the synthetic constmct. 

Various modifications to the synthetic polynucleotides may be introduced as a 
means of increasing intracellular stability and half-life. Possible modifications include but 
15 are not limited to the addition of flanking sequences of ribo- or deoxy- nucleotides to the 5' 
and/or 3' ends of the molecule or the use of phosphorothioate or 2' O-methyl rather than 
phosphodiesterase linkages within the oUgodeoxyribonucleotide backbone. 

The invention therefore contemplates a method of producing a synthetic 
polynucleotide as broadly described above, comprising linking together in the same 

20 reading firame at least two nucleic acid sequences encoding different segments of a parent 
polypeptide to fomi a synthetic polynucleotide, which encodes a synthetic polypeptide 
according to the invention. Suitably, nucleic acid sequences encoding at least 10 segments, 
preferably at least 20 segments, more preferably at least 40 segments and more preferably 
at least 100 segments of a parent polypeptide are employed to produce the synthetic 

25 polynucleotide. 

Preferably, the method further comprises selecting segments of the parent 
polypeptide, reverse translating the selected segments and preparing nucleic acid 
sequences encoding the selected segments. It is preferred that the method further comprises 
randomly linking the nucleic acid sequences together to form the synthetic polynucleotide. 
30 The nucleic acid sequences may be oligonucleotides or polynucleotides. 
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Suitably, segments are selected on the basis of size. Additionally, or in the 
alternative, segments are selected such that they have partial sequence identity or 
homology (i.e., sequence overlap) with one or more other segments. A number of factors 
can influence segment size and sequence overlap as mentioned above. In the case of 
5 sequence overlap, large amounts of duplicated nucleic acid sequences can sometimes result 
in sections of nucleic acid being lost during nucleic acid amplification {e.g.^ polymerase 
chain reaction, PGR) of such sequences, recombinant plasmid propagation in a bacterial 
host or during amplification of recombinant viruses containing such sequences. 
Accordingly, in a preferred embodiment, nucleic acid sequences encoding segments having 

10 sequence identity or homology with one or more other encoded segments are not linked 
together in an arrangement in which the identical or homologous sequences are contiguous. 
Also, it is preferable that different codons are used to encode a specific amino acid in a 
duplicated region. In this context, an amino acid of a parent polypeptide sequence is 
preferably reverse translated to provide a codon which, in the context of adjacent or local 

15 sequence elements, has a lower propensity of forming an undesirable sequence {e.g., a 
duplicated sequence or a palindromic sequence) that is refi-actory to the execution of a task 
{e.g., cloning or sequencing). Alternatively, segments may be selected such that they 
contain a carboxyl terminal leucine residue or such that reverse translated sequences 
encoding the segments contain restriction enzyme sites for convenient splicing of the 

20 reverse translated sequences. 

The method optionally fiirther comprises linking a spacer oligonucleotide 
encoding at least one spacer residue between segment-encoding nucleic acids. Such spacer 
residue(s) may be advantageous in ensuring that epitopes within the segments are 
processed and presented efficiently. Preferably, the spacer oligonucleotide encodes 2 to 3 
25 spacer residues. The spacer residue is suitably a neutral amino acid, which is preferably 
alanine. 

Optionally, the method further comprises linking in the same reading fi-ame as 
other segment-containing nucleic acid sequences at least one variant nucleic acid sequence 
which encodes a variant segment having a homologous but not identical amino acid 
30 sequence relative to other encoded segments. Suitably, the variant segment comprises 
conserved and/or non-conserved amino acid differences relative to one or more other 
encoded segments. Such differences may correspond to polymorphisms as discussed 
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above. In a preferred embodiment, degenerate bases are designed or built in to the at least 
one variant nucleic acid sequence to give rise to all desired homologous sequences. 

When a large number of polymorphisms is intended to be covered, it is preferred 
that multiple synthetic polynucleotides are constructed rather than a single synthetic 
5 polynucleotide, which encodes all variant segments. For example, if there is less than 85% 
homology between polymorphic polypeptides, then it is preferred that more than one 
synthetic polynucleotide is constructed. 

Preferably, the method further comprises optimising the codon composition of the 
synthetic polynucleotide such that it is translated efficiently by a host cell. In this regard, it 

10 is well known that the translational efficiency of different codons varies between 
organisms and that such differences in codon usage can be utilised to enhance the level of 
protein expression in a particular organism. In this regard, reference may be made to Seed 
et ah (Latemational Application Publication No WO 96/09378) who disclose the 
replacement of existing codons in a parent polynucleotide with synonymous codons to 

15 enhance expression of viral polypeptides in mammalian host cells. Preferably, the first or 
second most frequently used codons are employed for codon optimisation. 

Preferably, gene splicing by overlap extension or "gene SOEing" {supra) is 
employed for the construction of the synthetic polynucleotide which is a PCR-based 
method of recombining DNA sequences without reliance on restriction sites and of directly 

20 generating mutated DNA fi-agments in vitro. By modifying the sequences incorporated into 
the 5 '-ends of the primers, any pair of PGR products can be made to share a common 
sequence at one end. Under PGR conditions, the common sequence allows strands firom 
two different fi-agments to hybridise to one another, forming an overlap. Extension of this 
overlap by DNA polymerase yields a recombinant molecule. However, a problem with 

25 long synthetic constmcts is that mutations generally incorporate into amplified products 
during synthesis. In this instance, it is preferred that resolvase treatment is employed at 
various steps of the synthesis. Resolvases are bacteriophage-encoded endonucleases which 
recognise disruptions or mispairing of double stranded DNA and are primarily used by 
bacteriophages to resolve HoUiday junctions (Mizuuchi, 1982; Youil et al, 1995). For 

30 example, T7 endonuclease I can be employed in synthetic DNA constructions to recognise 
mutations and cleave corrupted dsDNA. The mutated DNA strands are then hybridised to 
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non-mutant or correct DNA sequences, which results in a mispairing of DNA bases. The 
mispaired bases are recognised by the resolvase, which then cleaves the DNA nearby 
leaving only correctly hybridised sequences intact. Preferably a thermostable resolvase 
enzyme is employed during splicing or ampUfication so that errors are not incorporated in 
5 downstream synthesis products. 

Synthetic polynucleotides accordiag to the invention can be operably linked to a 
regulatory polynucleotide in the form a synthetic construct as for example described in 
Section 2 supra. Synthetic constructs of the invention have utility inter alia as nucleic acid 
vaccines. The choice of regulatory polynucleotide and synthetic construct will depend on 
1 0 the intended host. 

Exemplary expression vectors for expression of a synthetic polypeptide according 
to the invention include, but are not restricted to, modified Ankara Vaccinia virus as for 
example described by Allen et ah (2000, J, JmrjtunoL 164(9): 4968-4978), fowlpox vims as 
for example described by Boyle and Coupar (1988, Virus Res. 10: 343-356) and the herpes 
15 simplex amplicons described for example by Fong et al. in U.S. Patent No. 6,051,428. 
Alternatively, Adenovirus and Epstein-Barr virus vectors, which are preferably capable of 
accepting large amounts of DNA or RNA sequence information, caa be used. 

Preferred promoter sequences that can be utilised for expression of synthetic 
polypeptides include the P7.5 or PE/L promoters as for example disclosed by Kumar and 
20 Boyle. (1990, Virology 179: 151-158), CMV and RSV promoters. 

The synthetic construct optionally further includes a nucleic acid sequence 
encoding an immunostimulatory molecule. The immunostimulatory molecule may be 
fusion partner of the synthetic polypeptide. Alternatively, the immunostimulatory molecule 
may be translated separately from the synthetic polypeptide. Preferably, the 

25 immunostimulatory molecule comprises a general immimostimulatory peptide sequence. 
For example, the immunostimulatory peptide sequence may comprise a domain of an 
invasin protein (Inv) from the bacteria Yersinia spp as for example disclosed by Brett et al 
(1993, Eur. J. Immunol 23: 1608-1614). This immune stimulatory property results from 
the capability of this invasin domain to interact with the /31 integrin molecules present on T 

30 cells, particularly activated immxme or memory T cells. A preferred embodiment of the 
invasin domain (Inv) for linkage to a synthetic polypeptide has been previously described 
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in U.S. Pat. No. 5,759,551. The said Inv domain has the sequence: Thr-Ala-Lys-Ser-Lys- 
Lys-Phe-Pro-Ser-Tyr-Thr-Ala-Thr-Tyr-Ghi-Phe [SEQ ID NO; 1467] or is an inunune 
stimulatory homologue thereof from the corresponding region in another Yersinia species 
invasin protein. Such homologues thus may contain substitutions, deletions or insertions of 
5 amino acid residues to accommodate strain to strain variation, provided that the 
homologues retain immxme stimulatory properties. The general immimostimulatory 
sequence may optionally be linked to the synthetic polypeptide by a spacer sequence. 

In an alternate embodiment, the iimnimostimulatory molecule may comprise an 
immunostimulatory membrane or soluble molecule, which is suitably a T cell co- 
10 stimulatory molecule. Preferably, the T cell co-stimulatory molecule is a B7 molecule or a 
biologically active fragment thereof, or a variant or derivative of these. The B7 molecule 
includes, but is not restricted to, B7-1 and B7-2. Preferably, the B7 molecule is B7-L 
Alternatively, the T cell co-stimulatory molecule may be an ICAM molecule such as 
ICAM-1 andICAM-2. 

15 In another embodiment, the immunostimulatory molecule can be a c^/tokine, 

which iQcludes, but is not restricted to, an interlenkin, a lymphokine, tinnour necrosis 
factor and an interferon. Alternatively, the hnmimostimulatory molecule may comprise an 
inummomodulatory oligonucleotide as for example disclosed by Krieg in U.S. Patent No. 
6,008,200. 

20 Suitably, the size of the synthetic polynucleotide does not exceed the ability of 

host cells to transcribe, translate or proteol)^ically process and present epitopes to the 
immune system. Practitioners in the art will also recognise that the size of the synthetic 
polynucleotide can impact on the capacity of an expression vector to express tbe synthetic 
polynucleotide in a host cell, hi this connection, it is known that the efficacy of DNA 

25 vaccination reduces with expression vectors greater that 20-kb. In such situations it is 
preferred that a larger number of smaller synthetic constmcts is utilised rather than a single 
large synthetic construct. 

4. Immunopotentiating compositions 

The invention also contemplates a composition, comprising an 
30 immunopotentiating agent selected from the group consisting of a synthetic polypeptide as 
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described in Section 2, and a synthetic polynucleotide or a synthetic construct as described 
in Section 3, together with a pharmaceutically acceptable carrier. One or more 
immunopotentiating agents can be used as actives in the preparation of 
inimunopotentiating corapositions. Such preparation uses routine methods known to 
5 persons skilled in the art. Typically, such compositions are prepared as injectables, either 
as liquid solutions or suspensions; solid forms suitable for solution in, or suspension in, 
liquid prior to injection may also be prepared. The preparation may also be emulsified. The 
active immunogenic ingredients are often mixed with excipients that are pharmaceutically 
acceptable and compatible with the active ingredient Suitable excipients are, for example, 
10 water, saline, dextrose, glycerol, ethanol, or the like and combinations thereof. In addition, 
if desired, the vaccine may contain minor amounts of auxiliary substances such as wetting 
or emulsifying, agents, pH buffering agents, and/or adjuvants that enhance the effectiveness 
of the vaccine. Examples of adjuvants which may be effective include but are not limited 
to: aluminium hydroxide, N-acetyl-muramyl-L-fhreonyl-D-isoglutamuie (fhur-MDP), N- 
15 acetyl-nor-muramyl-L-alanyl-D-isoglutamine (CGP 11637, referred to as nor-MDP), N- 
acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-(l*-2'-dipalmitoyl-sn-glycero~3- 
hydroxyphosphoryloxy)-ethylamine (CGP 1983 A, referred to as MTP-PE), and RIBI, 
which contains three components extracted from bacteria, monophosphoryl lipid A, 
trehalose dimycolate and cell wall skeleton (MPL+TDM+CWS) in a 2% squalene/Tween 
80 emulsion. For example, the effectiveness of an adjuvant may be determined by 
measuring the amount of antibodies resulting fi:om the administration of the composition, 
wherein those antibodies are directed against one or more antigens presented by the treated 
cells of the composition. 

The immunopotentiating agents may be formulated into a composition as neutral 
or salt forms. Pharmaceutically acceptable salts include the acid addition salts (formed 
with free amino groups of the peptide) and which are formed with inorganic acids such as, 
for example, hydrochloric or phosphoric acids, or such organic acids such as acetic, oxalic, 
tartaric, maleic, and the like. Salts formed with the firee carboxyl groups may also be 
derived from inorganic basis such as, for example, sodium, potassium, ammonium, 
calcium, or ferric hydroxides, and such organic basis as isopropylamine, trimethylamine, 
2-e1hylamino ethanol, histidine, procaine, and the like. 
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If desired, devices or compositions contaijiing the inrniimopotentiating agents 
suitable for sustained or intermittent release could be, in effect, implanted in the body or 
topically applied thereto for the relatively slow release of such materials into the body. 

The compositions are conventionally administered parenterally, by injection, for 
5 example, either subcutaneously or intramuscularly. Additional formulations which are 
suitable for other modes of administration inc^-^'"^ suppositories and, in some cases, oral 
formulations. For suppositories, traditional binders and carriers may include, for example, 
polyalkylene glycols or triglycerides; such suppositories may be formed from mixtures 
containing the active ingredient in the range of 0.5% to 10%, preferably l%-2%. Oral 
10 formulations include such nomially employed excipients as, for example, pharmaceutical 
grades of mannitol, lactose, starch, magnesium carbonate, and the like. These compositions 
take the form of solutions, suspensions, tablets, pills, capsules, sustained release 
formulations or powders and contain 10%-95% of active ingredient, preferably 25%-70%. 

Administration of the gene therapy construct to said mammal, preferably a 
15 human, may include delivery via direct oral intake, systemic injection, or delivery to 
selected tissue(s) or cells, or indirectly via dehvery to cells isolated from the mammal or a 
compatible donor. An example of the latter approach would be stem cell therapy, wherein 
isolated stem cells having potential for growth and differentiation are transfected with the 
vector comprising the Soxl8 nucleic acid. The stem cells are cultured for a period and then 
transferred to the manamal being treated. 

With regard to nucleic acid based compositions, all modes of delivery of such 
compositions are contemplated by the present invention. Delivery of these compositions to 
cells or tissues of an animal may be facilitated by microprojectile bombardment, Uposome 
mediated transfection {e.g., Upofectin or lipofectamine), electroporation, calcium 
phosphate or DEAE-dextran-mediated transfection, for example. In an alternate 
embodiment, a synthetic construct may be used as a therapeutic or prophylactic 
composition in the form of a "naked DNA" composition as is known in the art, A 
discussion of suitable delivery methods may be foimd in Chapter 9 of CURRENT 
PROTOCOLS IN MOLECULAR BIOLOGY (Eds. Ausubel a/.; John Wiley & Sons 
Inc., 1997 Edition) or on the Internet site DNAvaccine.com. The compositions may be 
administered by intradermal (e.g^., using panjet™ delivery) or intramuscular routes. 
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The step of introducing the synthetic polynucleotide into a target cell will differ 
depending on the intended use aad species, and can involve one or more of non-viral and 
viral vectors, cationic liposomes, retroviruses, and adenoviruses such as, for example, 
described in Mulligan, R.C., (1993 Science 260 926-932) which is hereby incorporated by 
5 reference. Such methods can include, for example: 

A. Local application of the synthetic polynucleotide by injection (Wolff et al, 1990, 
Science 247 1465-1468, which is hereby incorporated by reference), surgical 
implantation, instillation or any other means. This method can also be used in 
combination with local application by injection, siirgical implantation, instillation or 
10 any other means, of cells responsive to the protein encoded by the synthetic 
polynucleotide so as to increase the effectiveness of that treatment. This method can 
also be used in combination with local application by injection, surgical implantation, 
instillation or any other means, of another factor or factors required for the activity of 
said protein. 

15 B. General systemic delivery by injection of DNA, (Calabretta et ah, 1993, Cancer Treat 
Rev, 19 169-179, which is incorporated herein by reference), or RNA, alone or in 
combination with liposomes (Zhu et al, 1993, Science 261 209-212, which is 
incorporated herein by reference), viral capsids or nanoparticles (Bertling et aL, 1991, 
Biotech. AppL Biochem. 13 390-405, which is incorporated herein by reference) or any 

20 other mediator of delivery, hnproved targeting might be achieved by linking the 
synthetic polynucleotide to a targeting molecule (the so-called "magic bullet" approach 
employing, for example, an antibody), or by local application by injection, surgical 
implantation or any other means, of another factor or factors required for the activity of 
the protein encoding said synthetic polynucleotide , or of cells responsive to said 

25 protein. 

C. hijection or implantation or dehvery by any means, of cells that have been modijSed ex 
vivo by transfection (for example, in the presence of calcium phosphate: Chen et al, 
1987, Mole, Cell Biochem, 7 2745-2752, or of cationic Kpids and polyamines: Rose et 
aL, 1991 5 BioTech. 10 520-525, which articles are incorporated herein by reference), 
30 infection, injection, electroporation (Shigekawa et at., 1988, BioTech, 6 742-751, 
which is incorporated herein by reference) or any other way so as to increase the 
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expression of said synthetic polynucleotide in those cells. The modification can be 
mediated by plasmid, bacteriophage, cosmid, viral (such as adenoviral or retroviral; 
Mulligan, 1993, Science 260 926-932; Miller, 1992, Nature 357 455-460; Salmons et 
al, 1993 y Hum, Gen. Ther. 4 129-141, which articles are incorporated herein by 
5 reference) or other vectors, or other agents of modification such as liposomes (Zhu et 
aL, 1993, Science 261 209-212, which is incorporated herein by reference), viral 
capsids or nanoparticles (Bertling et al, 1991, Biotech. AppL Biochenu 13 390-405, 
which is incorporated herein by reference), or any other mediator of modification. The 
use of cells as a delivery vehicle for genes or gene products has been described by Barr 
10 et al, 1991, Science 254 1507-1512 and by Dhawan et aL, 1991, Science 254 1509- 
1512, which articles are incoiporated herein by reference. Treated cells can be 
delivered in combination with any nutrient, growth factor, matrix or other agent that 
will promote their survival in the treated subject. 

Also encapsulated by the present invention is a method for treatment and/or 
15 prophylaxis of a disease or condition, comprising administering to a patient in need of such 
treatment a therapeutically effective amount of a composition as broadly described above. 
The disease or condition may be caused by a pathogenic organism or a cancer as for 
example described above. 

In a preferred embodiment, the inamunopotentiating composition of the invention 
20 is suitable for treatment of, or prophylaxis against, a cancer. Cancers which could be 
suitably treated in accordance with the practices of this invention include cancers of the 
lung, breast, ovary, cervix, colon, head and neck, pancreas, prostate, stomach, bladder, 
kidney, bone Uver, oesophagus, brain, testicle, uterus, melanoma and the various leukemias 
and lymphomas. 

25 In an alternate embodiment, the immunopotentiating composition is suitable for 

treatment of, or prophylaxis against, a viral, bacterial or parasitic infection. Viral infections 
contemplated by the present invention include, but are not restricted to, infections caused 
by HIV, Hepatitis, Influenza, Japanese encephalitis virus, Epstein-Barr virus and 
respiratory syncj^ial virus. Bacterial infections include, but are not restricted to, those 

30 caused by Neisseria species. Meningococcal species, Haemophilus species Salmonella 
species. Streptococcal species, Legionella species and Mycobacterium species. Parasitic 
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infections encompassed by the invention include, but are not restricted to, those caused by 
Plasmodium species. Schistosoma species, Leishmania species, Trypanosoma species, 
Toxoplasma species and Giardia species. 

The above compositions or vaccines may be adnainistered in a manner compatible 
5 with the dosage formulation, and in such amount as is therapeutically effective to alleviate 
patients from the disease or condition or as is prophylactically effective to prevent 
incidence of the disease or condition in the patient. The dose administered to a patient, in 
the context of the present invention, should be sufficient to effect a beneficial response in a 
patient over time such as a reduction or cessation of blood loss. The quantity of the 

10 composition or vaccine to be administered may depend on the subject to be treated 
inclusive of the age, sex, weight and general health condition thereof In this regard, 
precise amoimts of the composition or vaccine for administration will depend on the 
judgement of the practitioner. In determining the effective amount of the composition or 
vaccine to be administered in the treatment of a disease or condition, the physician may 

15 evaluate the progression of the disease or condition over time. In any event, those of skill 
in the art may readily determine suitable dosages of the composition or vaccine of the 
invention. 

In a preferred embodiment, DNA-based immunopotentiating agent {e.g., 100 iig) 
is delivered intradermally into a patient at day 1 and at week 8 to prime the patient. A 
20 recombinant poxvirus {e.g., at 10^ pfu/mL) from which substantially the same 
immunopotentiating agent can be expressed is then delivered intradermally as a booster at 
weeks 16 and 24, respectively. 

The effectiveness of the immunisation may be assessed using any suitable 
technique. For example, CTL lysis assays may be employed using stimulated splenocytes 

25 or peripheral blood mononuclear cells (PBMC) on peptide coated or recombinant viras 
infected cells using ^^Cr labelled target cells. Such assays can be performed using for 
example primate, mouse or human cells (Allen et ah, 2000, J. Immunol 164(9): 4968-4978 
also Woodberry et ah, infra). Alternatively, the efficacy of the immunisation may be 
monitored using one or more techniques including, but not limited to, HLA class I 

30 Tetramer staining - of both fresh and stimulated PBMCs (see for example Allen et ah, 
supra), proliferation assays (Allen et ah, supra), Elispot™ Assays and intracellular INF- 
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gamma staining (Allen et aL, supra), ELISA Assays - for linear B cell responses; and 
Western blots of cell sample expressing the synthetic polynucleotides. 

5. Computer related embodiments 

The design or construction of a synthetic polypeptide sequence or a synthetic 
5 polynucleotide sequence according to the invention is suitably facilitated with the 
assistance of a computer programmed with software, which inter alia fragments a parent 
sequence into fragments, aad which links those fragments together in a different 
relationship relative to their linkage in the parent sequence. The ready use of a parent 
sequence for the construction of a desired synthetic molecule according to the invention 
10 requires that it be stored in a computer-readable format. Thus, in accordance with the 
present invention, sequence data relating to a parent molecule (e.g-., a parent polypeptide) 
is stored in a machine-readable storage medium, which is capable of processing the data to 
fragment the sequence of the parent molecule into fragments and to link together the 
fragments in a different relationship relative to their linkage in the parent molecule. 

15 Therefore, another embodiment of the present invention provides a machine- 

readable data storage meditmi, comprising a data storage material encoded with machine 
readable data which, when used by a machine programmed with instructions for using said 
data, fragments a parent sequence into fragments, and links those fragments together in a 
different relationship relative to their linkage in the parent sequence. In a preferred 

20 embodiment of this type, a machine-readable data storage medixmi is provided that is 
capable of reverse translating the sequence of a respective fragment to provide a nucleic 
acid sequence encoding the fragment and to link together in the same reading frame each 
of the nucleic acid sequences to provide a polynucleotide sequence that codes for a 
polypeptide sequence in which said fragments are linked together in a different relationship 

25 relative to their linkage in a parent polypeptide sequence. 

In another embodiment, the invention encompasses a computer for designing the 
sequence of a synthetic polypeptide and/or a synthetic polynucleotide of the invention, 
wherein the computer comprises wherein said computer comprises: (a) a machine readable 
data storage medium comprising a data storage material encoded with machine readable 
30 data, wherein said machine readable data comprises the sequence of a parent polypeptide; 
(b) a working memory for storing instructions for processing said machine-readable data; 
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(c) a central-processing unit coupled to said working memory and to said machine-readable 
data storage medium, for processing said machine-readable data into said synthetic 
polypeptide sequence and/or said synthetic polynucleotide; and (d) an output hardware 
coupled to said central processing unit, for receiving said synthetic polypeptide sequence 
5 and/or said synthetic polynucleotide. 

In yet another embodiment, the invention contemplates a computer program 
product for designing the sequence of a synthetic polynucleotide of the invention, 
comprising code that receives as input the sequence of a parent polypeptide, code that 
fragments the sequence of the parent polypeptide into fragments, code that reverse 

10 translates the sequence of a respective fragment to provide a nucleic acid sequence 
encoding the fragment, code that links together in the same reading frame each said nucleic 
acid sequence to provide a polj/nucleotide sequence that codes for a polypeptide sequence 
in which said fragments are linked together in a different relationship relative to their 
linkage in the parent polypeptide sequence, and a computer readable medium that stores 

15 the codes. 

A version of these embodiments is presented in Figure 23, which shows a system 
10 including a computer 11 comprising a central processing unit ("CPU") 20, a working 
memory 22 which may be, e.g., RAM (random-access memory) or "core" memory, mass 
storage memory 24 (such as one or more disk drives or CD-ROM drives), one or more 
20 cathode-ray tube ("CRT") display terminals 26, one or more keyboards 28, one or more, 
input lines 30, and one or more output lines 40, all of which are intercoimected by a 
conventional bidirectional system bus 50. 

Input hardware 36, coupled to computer 11 by input lines 30, may be 
implemented in a variety of ways. For example, machine-readable data of this invention 
25 may be inputted via the use of a modem or modems 32 connected by a telephone line or 
dedicated data line 34. Altematively or additionally, the input hardware 36 may comprise 
CD. Altematively, ROM drives or disk drives 24 in conjunction with display terminal 26, 
keyboard 28 may also be used as an input device. 

Output hardware 46, coupled to computer 1 1 by output lines 40, may similarly be 
30 implemented by conventional devices. By way of example, output hardware 46 may 
include CRT display terminal 26 for displaying a synthetic polynucleotide sequence or a 
synthetic polypeptide sequence as described herein. Output hardware might also include a 
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printer 42, so that hard copy output may be produced, or a disk drive 24, to store system 
output for later use. 

In operation, CPU 20 coordinates the use of the various input and output devices 
36,46 coordinates data accesses from mass storage 24 and accesses to and from working 
5 memory 22, and determines the sequence of data processing steps. A number of programs 
may be used to process the machine readable data of this invention. Exemplary programs 
may use for example the steps outlined in the flow diagram illustrated in Figure 24. 
Broadly, these steps include (1) inputting at least one parent polypeptide sequence; (2) 
optionally adding to alanine spacers at the ends of each polypeptide sequence; (3) 

10 fragmenting the polypeptide sequences into fragments (e.g., 30 amino acids long), which 
are preferably overlapping (e.g., by 15 amino acids); (4) reverse translating the fragment to 
provide a nucleic acid sequence for each fragment and preferably using for the reverse 
translation first and second most translationally efficient codons for a cell type, wherein the 
codons are preferably altemated out of frame with each other in the overlaps of 

15 consecutive fragments; (5) randomly rearranging the fragments; (6) checking whether 
rearranged fragments recreate at least a portion of a parent polypeptide sequence; (7) 
repeating randomly rearranging the fragments when rearranged fragments recreate said at 
least a portion; or otherwise (8) linking the rearranged fragments together to produce a 
synthetic polypeptide sequence and/or a synthetic polynucleotide sequence; and (9) 

20 outputting said synthetic polypeptide sequence and/or a synthetic polynucleotide sequence. 
An example of an algorithm which uses inter alia the aforementioned steps is shown in 
Figure 25. By way of example, this algorithm has been used for the design of synthetic 
polynucleotides and synthetic polypeptides according to the present invention for Hepatitis 
C la and for melanoma, as illustrated in Figures 26 and 27. 

25 Figure 28 shows a cross section of a magnetic data storage medium 100 which can 

be encoded with machine readable data, or set of instructions, for designing a synthetic 
molecule of the invention, which can be carried out by a system such as system 10 of 
Figure 23. Medium 100 can be a conventional floppy diskette or hard disk, having a 
suitable substrate 101, which may be conventional, and a suitable coating 102, which may 

30 be conventional, on one or both sides, containing magnetic domains (not visible) whose 
polarity or orientation can be altered magnetically. Medium 100 may also have an opening 
(not shown) for receiving the spindle of a disk drive or other data storage device 24. The 
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magnetic domains of coating 102 of medium 100 are polarised or oriented so as to encode 
in maimer which may be conventional, machine readable data such as that described 
herein, for execution by a system such as system 10 of Figure 23. 

Figure 29 shows a cross section of an optically readable data storage medium 110 
5 which also can be encoded with such a machine-readable data, or set of instructions, for 
designing a synthetic molecule of the invention, which can be carried out by a system such 
as system 10 of Figure 23. Medium 110 can be a conventional compact disk read only 
memory (CD-ROM) or a rewritable medium such as a magneto-optical disk, which is 
optically readable and magneto-optically writable. Medium 100 preferably has a suitable 
10 substrate 111, which may be conventional, and a suitable coating 112, which may be 
conventional, usually of one side of substrate 111. 

In the case of CD-ROM, as is well known, coating 112 is reflective and is 
impressed with a plurality of pits 113 to encode the machine-readable data. The 
arrangement of pits is read by reflecting laser light off the surface of coating 112. A 
15 protective coating 114, which preferably is substantially transparent, is provided on top of 
coating 112. 

In the case of a magneto-optical disk, as is well known, coating 112 has no pits 
113, but has a plurality of magnetic domains whose polarity or orientation can be changed 
magnetically when heated above a certain temperature, as by a laser (not shown). The 
20 orientation of the domains can be read by measuring the polarisation of laser light reflected 
from coating 1 12. The arrangement of the domains encodes the data as described above. 

Li order that the invention may be readily understood and put into practical effect, 
particular preferred non-Umiting embodiments will now be described as follows. 
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EXAMPLES 
EXAMPLE 1 

Preparation of an HIVSavine 

Experimental Protocol 
5 Plasmids 

The plasmid pDNAVacc is ampicillin resistant and contains an expression 
cassette comprising a CMV promoter and enhancer, a synthetic intron, a multiple cloning 
site (MCS) and a SV40poly A signal sequence (Thomson et aL, 1998). The plasmid 
pTK7.5 and contains a selection cassette, a pox vims 7.5 early/late promoter and a MCS 
10 flanked on either side by Vaccinia vims TK gene sequences. 

Recombinant Vaccinia Viruses 

Recombinant Vaccinia vimses expressing the gag, env (IIB) and pol (LAI) genes 
of HIV- 1 were used as previously described and denoted VV-GAG, VV-POL, VV-ENV 
(Woodberry et al, 1999; Kent et al, 1998). 

1 5 Marker Rescue Recombination 

Recombinant Vaccinia vimses containing Savine constmcts were generated by 
marker rescue recombination, using protocols described previously (Boyle et aL, 1985). 
Plaque purified viruses were tested for the TK phenotype and for the appropriate genome 
arrangement by Southern blot and PGR. 

20 Oligonucleotides 

Oligonucleotides 50 xmaol scale and desalted were pmrchased from Life 
Technologies. Short oUgonucleotides were resuspended in 100 of water, their 
concentration determined, then diluted to 20 |liM for use in PGR or sequencing reactions. 
Long oligonucleotides for splicing reactions were denatured for 5 minutes at 94'^G in 
25 20 fiL of fonnamide loading buffer then 0.5 |liL gel purified on a 6% polyacrylamide gel. 
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Gel slices containing full-length oligonucleotides were visualised with ethidium bromide, 
excised, placed in EppendorF*^ tubes, combined with 200 \iL of water before being 
crushed using the plunger of a 1 mL syringe. Before being used in splicing reactions the 
crushed gel was resuspended in an appropriate volimie of buffer aad 1-2 jliL of the 
5 resuspendate used directly in the splicing reactions. 

Sequencing 

Sequencing was performed using Dye terminator sequencing reactions and 
analyzed by the Biomedical Resource Facility at the John Curtin School of Medical 
Research using an ABI automated sequencer. 

10 Restimulation of Lymphocytes from HIV Infected Patients 

Two pools of recombinant Vaccinia vimses containing W-ACl + VV-BCl (Pool 
1) or VV-AC2 + VV-BC2 + W-CC2 (Pool 2) were used to restimulate lymphocytes from 
the blood samples of HIV-infected patients. Briefly CTL lines were generated from HIV- 
infected donor PBMC. A fifth of the total PBMC were infected with either Pool 1 or Pool 2 

15 Vaccinia viruses then added back to the original cell suspension. The infected cell 
suspension was then cultured with IL-7 for 1 week. 

CTL Assays 

Restimulated PBMCs were used as effectors in a standard ^^Cr-release CTL assay. 
Targets were autologous EBV-transformed lymphoblastoid cell lines (LCLs) infected with 
20 the following vimses : Pool 1, Pool 2,W-GAG, W-POL or W-ENV. Assay controls 
included uninfected targets, targets infected with W-lacZ (vims control) and K562 cells. 

Results 

HIVSavine Design 

A main goal of the Savine strategy is to include as much protein sequence 
25 information from a pathogen or cancer as possible in such a way that potential T cell 
epitopes remain intact and so that the vaccine or therapy is extremely safe. An HIV Savine 
is described herein not only to compare this strategy to other strategies but also, to produce 



wo 01/90197 



PCT/AUOl/00622 



-116- 

an HTV vaccine that would provide the maximum possible population coverage as well as 
catering for the major HIV clades. 

A number of design criteria was fibrst determined to exploit the many advantages 
of using a synthetic approach. One advantage is that it is possible to use consensus protein 
5 sequences to design these vaccines. Using consensus sequences for a highly variable virus 
like HIV should provide better vaccine coverage because individual viral isolate sequences 
may have lost epitopes which induce CTL against the majority of other viral isolates. Thus, 
using the"consensus sequences of each HIV clade rather than individual isolate sequences 
should provide better vaccine coverage. Taking this one step further, a consensus sequence 

10 that covers all HIV clades should theoretically provide better coverage than using just the 
consensus sequences for individual clades. Before designing such a sequence however, it 
was decided that a more appropriate and focussed HIV vaccine might be constructed if the 
various clades were first ranked according to their relative importance. To establish such a 
ranking the following issues were considered, current prevalence of each clade, the rate at 

15 which each clade is increasing and the capacity of various regions of the world to cope 
with the HIV pandemic (Figures 1 and 2). These criteria produced the following ranking, 
Clade E > clade A > clade C > clade B > clade D > other clades. Clades E and A were 
considered to almost equal since they are very similar except in their envelope protein 
sequences, which differ considerably. 

20 Another advantage of synthesising a designed sequence is that it is possible to 

incorporate degenerate sequences into their design. In the case of HIV, this means that 
more than one amino acid can be included at various positions to improve the ability of the 
vaccine to cater for the various HIV clades and isolates. Coverage is improved because 
mutations in different HTV clades and also in individual isolate sequences, while mostly 

25 destroying specific T cell epitopes, can result in the formation of new potentially useful 
epitopes nearby (Goulder et aL, 1997). Incorporating degenerate amino acid sequences, 
however, also means that more than one construct must be made and mixed together. The 
number of constructs required depends on the frequency with which mutations are 
incorporated into the design. While this approach requires the construction of additional 

30 constructs, these constmcts can be prepared from the same set of degenerate long 
oligonucleotides, significantly reduciag the cost of providing such considerable interclade 
coverage. 
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A set of degeneracy rules was developed for the incorporation of amino acid 
mutations iato the design which meant that a maximum of eight constructs would be 
required so that theoretically all combinations were present, as follows: 1) Two amino 
acids at three positions (or less) within any group of nine amino acids {i,e,, present in a 
5 CTL epitope); 2) Three amino acids at one position and two at another (or not) within any 
group of nine amino acids; 3) Four amino acids at one position and two at another (or not) 
within any group of nine amino acids. The reason why these rules were applied to nine 
amino acids (the average CTL epitope size) and not to larger stretches of amino acid 
sequence to cater for class II restricted epitopes, is because class Il-restricted epitopes 
10 generally have a core sequence of nine amino acids in the middle which bind specifically 
to class n MHC molecules with the extra flanking sequences stabilising binding, by 
associating Avith either side of class 11 MHC antigens in a largely sequence iadependent 
manner (Brown et aL, 1993). 

Using the HIV clade ranking described above, the amino acid degeneracy rules 
15 and in some situations the similarity between amino acids, a degenerate consensus protein 
sequence was designed for each HIV protein using the consensus protein sequences for 
each HIV clade compiled by the Los Alamos HTV sequence database (Figures 3-11) (HIV 
Molecular Immunology Database, 1997). It is important to note that in some situations the 
order with which each of the above design criteria was applied was altered. Each time this 
was done the primary goal however was to increase the ability of the Savine to cater for 
interclade differences. Two isolate sequences, GenBank accession U51189 and U46016, 
for clade E and clade C, respectively, were used when a consensus sequence for some HIV 
proteins from these two clades was unavailable (Gao et aL, 1996; Salminen et ah, 1996). 
The design of a consensus sequence for the hypervariable regions of the HIV envelope 
protein and in some cases between these regions (hypervariable regions 1-2 and 3-5) was 
difficult and so these regions were excluded from the vaccine design. 

Once a degenerate consensus sequence was designed for each YLIV protein, an 
approach was then determined for incorporating all the protein sequences safely into the 
vaccine. One convenient approach to ensure that a vaccine will be safe is to systematically 
fragment and randomly rearrange the protein sequences together thus abrogating or 
otherwise altering their structure and fimction. The protein sequences still have to be 
immunologically functional however, meaning that the process used to fragment the 
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sequences .should not destroy potential epitopes. To decide on the best approach for 
systematically fragmenting protein sequences, the main criteria used was the size of T 
epitopes and their processing requirements. Class I-restricted T cell epitopes are 8-10 
amino acids long and generally require 2-3 natural flanking amino acids to ensure their 
5 efficient processing and presentation if placed next to unnatural flanking residues (Del Val 
et ah, 1991; Thomson et aL^ 1995). Class Il-restricted T cell epitopes range between 12-25 
amino acids long and do appear to require natiu-al flanking residues for processing 
however, it is difficult to rule out a role for natural flanking residues in all cases due to the 
complexity of their processing pathways (Thomson et aL^ 1998). Also class Il-restricted 

10 epitopes despite being larger than CTL epitopes generally have a core sequence of 9-10 
amino acids, which binds to MHC molecules in a sequence specific fashion. Thus, based 
on current knowledge, it was decided that an advantageous approach was to overlap the 
fragments by at least 15 amino acids to ensure that potential epitopes which might he 
across fragment boiradaries are not lost and to ensure that CTL epitopes near fragment 

15 boundaries, that are placed beside or near inhibitory amino acids in adjacent fragments, are 
processed efficiently. In deciding the optimal fragment size, the main criteria used were 
that size had to be small enough to cause the maximum disruption to the structure and 
ftinction of proteins but large enough to cover the sequence information as efficiently as 
possible without any fttrther mmecessary duplication. Based on these criteria the fragments 

20 would be twice the overlap size, in this case 30 amino acids long. 

The designed degenerate protein sequences were then separated into fragments 30 
amino acid long and overlapping by fifteen amino acids. Two alanine amino acids were 
also added to the start and end of the first and last fragment for each protein or envelop 
protein segment to ensure these fragments were not placed directly adjacent to amino acids 

25 capable of blocking epitope processing (Del Val et ah, 1991). The next step was to reverse 
translate each protein sequence back into DNA. Duplicating DNA sequences was avoided 
when constructing DNA sequences encoding a tandem repeat of identical or homologous 
amino acid sequences to maximise expression of the Savine. In this regard, the first and 
second most commonly used mammalian codons (shown in Figure 12) weare assigned to 

30 amino acids in these repeat regions, whereia a first codon was used to encode an amino 
acid in one of the repeated sequences and wherein a second but synonymous codon was 
used for the other repeated sequence (e.g-., see the gag HIV protein in Figure 13), To cater 
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for the designed amino acid mutations more than one base was assigned to some positions 
using the lUPAC DNA codes without exceeding more than three base variations (eight 
possible combinations) in any group of 27 bases (Figure 12). Where a particular 
combination of amino acids could not be incorporated, because too many degenerate bases 
5 would be required, some or all of the amino acid degeneracy was removed according to the 
protein consensus design rules outlined above. Also the degenerate codons were checked 
to determine if they could encode a stop codon, if stop codons could not be avoided then 
the amino acid degeneracy was also simplified again according to the protein consensus 
desigDL rales outlined above. 

10 The designed DNA segments were then scrambled randomly and joined to create 

twenty-two subcassettes approximately 840 bp in size. Extra DNA sequences incorporating 
sites for one of the cohesiye restriction enzymes Xbal^ Spel, AvrU oxNhel and 3 additional 
base pairs (to cater for premature Taq polymerase termination) were then added to each 
end of each subcassette (Figure 14). Some of these extra DNA sequences also contained, 

15 the cohesive restriction sites for Sail or Xhol, Kozak signal sequences and start or stop 
codons to enable the subcassettes to be joined and expressed either as three large cassettes 
or one full length protein (Figures 14 and 15). 

In designing the HIV Savine one issue that required investigation was whether 
such a large DNA molecule would be fully expressed and whether epitopes encoded near 

20 the end of the molecule would be efficiently presented to the immune system. The 
inventors also wished to show that mixing two or more degenerate Savine constructs 
together could induce T cell responses that recognise mutated sequences. To examine both 
issues DNA coding for a degenerate murine influenza nucleoprotein CTL epitope, NP365- 
373, which differs by two amino acids at positions 71 and 72 in influenza strain A/PR/8/34 

25 compared to the A/NT/60/68strain and restricted by H2-Db, was inserted before the last 
stop codon at the end of the HIV Savine design (Figure 15). An important and unusual 
characteristic of both of these naturally occurring NP365-373 sequences, which enabled 
the present inventors to examine the effectiveness of incorporating mutated sequences, is 
that they generate CTL responses which do not cross react with the alternate sequence 

30 (Townsend et aL, 1986). This is an unusual characteristic because epitopes not destroyed 
by mutation usually induce CTL responses that cross-react. 
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Up to ten long oligonucleotides up to 100 bases long and two short amplificatioa 
oligonucleotides were synthesised to enable construction of each subcassette (Life 
Technologies). In designing each oligonucleotide the 3' end and in most cases also the 5* 
end had to be either a 'c' or a 'g' to ensure efficient extension during PGR splicing. The 
5 overlap region for each long oligonucleotide was designed to be at least 16 bp with 
approximately 50% G/C content. Also oligonucleotide overlaps were not placed where 
degenerate DNA bases coded for degenerate amino acids to avoid splicing difficidties 
later. Where this was too difficult some degenerate bases were removed according to the 
protein consensus design rules outlined above and indicated in Figure 12. Figure 16 shows 
10 an example of the ohgonucleotides design for each subcassette. 

Construction of the HIVSavine 

Five of each group of ten designed oligonucleotides were spliced together using 
stepwise asymmetric PGR (Sandhu et ah, 1992) and Splicing by Overlap Extension 
(SOEing) (Figure 17a). Each subcassette was then PGR amplified, cloned into 

15 pBluescript™ n KS" using BamHUEcoRl and 16 individual clones sequenced. Mutations, 
deletions and insertions were present in the large majority of the clones for each 
subcassette, despite acrylamide gel purification of the long oligonucleotides. In order to 
constmct a fimctional Savine with minimal mutations, two clones for each subcassette with 
no insertions or deletions and hence a complete open reading firame and with niiiumal 

20 numbers of non-designed mutations, were selected firom the sixteen available. The 
subcassettes were then excised firom their plasmids and joined by stepwise PGR-amplified 
ligation using the polymerase blend Elongase'^^ (Life Technology), T4 DNA ligase and the 
cohesive restriction enzymes XbaVSpel/AvrlVNhel, to generate two copies of cassettes A, 
B and C as outlined in Figure 14 and shown in Figure 17b. Predicted sequences for these 

25 cassettes are shown in Figure 30. Each cassette was then reamphfied by PGR with 
Elongase™, cloned into pBluescripf^ n KS" and 3 of the resulting plasmid clones 
sequenced using 12 of liie 36 sequencing primers designed to cover the full length 
construct. Clones with minimal or no further mutations were selected for transfer into 
plasmids for DNA vaccination or used to make recombinant poxviruses. A summary of the 

30 nimaber of designed and non-designed mutations in each Savine constmct is presented in 
Table 1. 



wo 01/90197 



PCT/AUOl/00622 



-121- 

TABLE 1 



Summary of mutations 



ij 

•Constmct. 


No', aas i 


. ' . Nximber of mutations 


Designed 


Expected 
• in 2 clones 


Acfual in 2 
clones 


' Non-designed 


Cassette A 


1896 


249 


124 


107 


5 (ACl), 8 (AC2) 


Cassette B 


1184 


260 


130 


124 


11 (BC1),4(BC2) 


Cassette C 


1969 


276 


138 


121 


lO(CCl), 14(CC2) 


Fvill length 


5742 


785 


392 


352 


26 (FLl), 26 (FL2) 



Summary of the mutations present in the two full-length clones constructed as determined by 
5 sequencing. Includes the number of mutations designed, expected and actually present in the 2 clones and the 
number of non-designed mutations in each cassette and full-length clone. 



HJVSavineDNA vaccines and Recombinant Vaccinia viruses 

To test the inimunological effectiveness of the HIV Savine constructs the cassette 
sequences were transferred into DNA vaccine and poxvirus vectors. These vectors when 
10 used either separately in immunological assays described below or together in a 'prime- 
boost' protocol which has been shown previously to generate strong T cell responses in 
vivo (Kent et ah, 1997). 

DNA Vaccination plasmids were constructed by excising the cassettes from the 
selected plasmid clones with XbaVXhol (cassette A) or XbaVSaR (cassettes B and C) and 

1 5 Hgating them into pDNAVacc cut with XbaVXhol to create pD VAC 1 , pDVAC2, pDVBC 1 , 
pDVBC2, pDVCCl, pDVCC2, respectively (Figure 18a). These plasmids were then 
further modified by cloning into their J^al site a DNA fragment excised using XbaVAvr^ 
from pTUMERA2 and encoding a synthetic endoplasmic reticulimi (ER) signal sequence 
from the Adenovirus El A protein (Persson et al, 1980) (Figure 18a). ER signal sequences 

20 have been shown previously to enhance the presentation of both CTL and T helper 
epitopes in vivo (Ishioka, G.Y., 1999; Thomson et al, 1998). The plasmids pDVERACl, 
pDVERBCl, pDVERCCl andpDVERAC2, pDVERBC2, pDVERCC2 were then mixed 
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together to create, plasmid pool 1 and pool 2 respectively. Each plasmid pool collectively 
encodes one copy of the designed full-length HIV Savine. 

Plasmids to generate recombinant Vaccinia viruses which express HIV Savine 
sequences were constructed by excising the various HIV Savine cassettes from the selected 
5 plasmid clones using BaniHUXhol (cassette A) or BamHJJSalL (cassettes B and C) and 
cloned into the marker rescue plasmid, pTK7.5, cleaved with BamBI/Sall. These pTK7.5- 
derived plasmids were then used to generate recombinant Vaccinia viruses by marker 
rescue recombination using established protocols (Boyle et aL, 1985) to generate VV-ACl, 
W-AC2, W-BCl, W-BC2, VV-CCl and W~CC2 (Figure 18b). 

10 Two further DNA vaccine plasmids were constructed each encoding a version of 

the full length HIV Savine (Figure 18c). Briefly, the two versions of cassette B were 
excised with Xhol and cloned into the corresponding selected plasmid clones containing 
cassette A sequences that were cut with XhoVSall to generate pBSABl and pBSAB2 
respectively. The joined A/B cassettes in pBSABl and pBSAB2 were excised with 

15 XbaVXhol and cloned into pDVCCl and pDVCC2, respectively, and cleaved with 
XbaVXhol to generate pDVFLl and pDVFL2. These were then further modified to contain 
an ER signal sequence using the same cloning strategy as outlined in figure 18 a. 



Restimulation of HIV specific lymphocytes from HIV infected patients 

The present inventors examined the capacity of the HTV Savine to restimulate 

20 HIV-specific polyclonal CTL responses from HIV-infected patients. PBMCs from three 
different patients were restimulated in vitro with two HIV Savine Vaccinia virus pools 
(Pool 1 included VV-ACl andVV-BCl; Pool 2 included W-AC2, W-BC2 and VV-CC2) 
then used in CTL lysis assays against LCLs infected either with one of the Savine Vaccinia 
virus pools or Vaccinia viruses which express gag, env or pol. Figure 19 clearly shows, 

25 that in all three assays, both HIV Savine viral pools restimulated HIV-specific CTL 
responses which could recognise targets expressing whole natural HIV antigens and not 
targets which were uninfected or infected with the control Vaccinia vims. Furthermore, in 
all three cases, both pools restimulated responses that recognised all three natural HIV 
antigens. This result suggests that the combined Savine constmcts will provide broader 

30 immunological coverage than single antigen based vaccine approaches. The level of lysis 
in each case of targets infected Avifh Savine viral pools was significantly higher than the 
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lysis recorded for any other infected target. This probably reflects the combined CTL 
responses to gag, pol, and env plus other HIV antigens not analysed here but whose 
sequences are also incorporated into the Savine constructs. 

CTL recognition of each BffV antigen is largely controlled by each patient's HLA 
5 background hence the pattern of CTL lysis for whole HIV antigens is different in each 
patient. Interestingly, this CTL lysis pattern did not change when the second Savine 
Vaccinia virus pool was used for CTL restimulation. In these assays, therefore, the 
inventors were unable to demonstrate clear differences between pools 1 and 2, despite pool 
1 lacking a Vaccinia virus expressing cassette CCl and despite the many amino acid 
10 differences between the A and B cassettes in each pool (see table 1). 

From the foregoing, the present inventors have developed a novel 
vaccine/therapeutic strategy. In one embodiment, pathogen or cancer protein sequences are 
systemically fragmented, reverse translated back into DNA, rearranged randomly then 
joined back together. The designed synthetic DNA sequence is then constructed using long 

15 oligonucleotides and can be transferred into a range of delivery vectors. The vaccine 
vectors used here were DNA vaccine plasmids and recombinant poxvirus vectors which 
have been previously shown to elicit strong T cell responses when used together in a 
'prime-boost' protocol (Kent et aL, 1997). An important advantage of scrambled antigen 
vaccines or 'Savines' is that the amount of starting sequence information for the design can 

20 be easily expanded to include the majority of the protein sequences from a pathogen or for 
cancer, thereby providing the maximum possible vaccine or therapy coverage for a given 
population. 

An embodiment of the systematic fragmentation approach described herein was 
based on the size and processing requirements for T cell epitopes and was designed to 
25 cause maximal disraption to the structure and ftmction of protein sequences. This 
fragmentation approach ensures that the maximimi possible range of T cell epitopes will be 
present from any incorporated protein sequence without the protein being fimctional and 
able to compromise vaccine safety 

Another important advantage of Savines is that consensus protein sequences can 
30 be used for their design. This feature is only appHcable when the design needs to cater for 
pathogen or cancer antigens whose sequence varies considerably. HTV is a highly 
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mutagenic virus, hence this feature was utilised extensively to design a vaccine which has 
the potential to cover not only field isolates of HIV but also the major HIV clades involved 
in the current HIV pandemic. To construct the HIV Savine, one set of long 
ohgonucleotides was synthesised, which included degenerate bases in such a way that 8 
5 constmcts are theoretically required for the vaccine to contain all combinations in aay 
stretch of 9 amino acids. The inventors believe that this approach can be improved for the 
following reasons: 1) While degenerate bases should be theoretically equally represented, 
in practice some degenerate bases were biased towards one base or the other, leading to a 
lower than expected frequency of the designed mutations in the two full length HTV 
10 Savines which were constructed (see Table 1). 2) Only sequence combinations actually 
present in the HIV clade consensus sequences are required to get full clade coverage, 
hence the number of full length constructs needed could be reduced. To reduce the nimiber 
of constructs however, separate sets of long oligonucleotides would have to be synthesised, 
significantly increasing the cost, time and effort required to generate a vaccine capable of 
15 such considerable vaccine coverage. 

A significant problem during the construction of the HTV Savine synthetic DNA 
sequence was the incorporation of non-designed mutations. The most serious types of 
mutations were insertions, deletions or those giving rise to stop codons, aU of which 
change the frame of the synthesised sequences and/or caused premature truncation of the 

20 Savine proteins. These types of mutation were removed during construction of the HIV 
Savines by sequencing multiple clones after subcassette and cassette constraction and 
selecting functional clones. The major source of these non-designed mutations was in the 
long oligonucleotides used for Savine synthesis, despite their gel purification. This 
problem could be reduced by making the initial subcassettes smaller thereby reducing the 

25 possibility of corrupted oligonucleotides being incorporated into each subcassette clone. 
The second major cause of non-designed mutations was the large number of PGR cycles 
required for the PGR and hgation-mediated joining of the subcassettes. Including extra 
sequencing and clone selection steps during the subcassette joining process should help to 
reduce the frequency of non-designed mutations in future constructs. Finally, another 

30 method that could help reduce the frequency of such mutations at all stages is to use 
resolvase treatment. Resolvases are bacteriophage-encoded endonucleases which recognise 
dismptions to double stranded DNA and are primarily used by bacteriophages to resolve 
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Holliday junctions (Mizuuchi, 1982; Youil et ah, 1995). T7 endonuclease I has already 
been used by the present inventors in synthetic DNA constructions to recognise mutations 
and cleave corrupted dsDNA to allow gel purification of correct sequences. Cleavage of 
corrupted sequences occurs because after a simple denaturing and hybridisation step 
5 mutated DNA hybridises to correct DNA sequences and results in a mispairing of DNA 
bases which is able to be recognised by the resolvase. This method resulted in a 50% 
reduction in the frequency of errors. Further optimisation of this method and the use of a 
thermostable version of this type of enzyme could further reduce the frequency of errors 
during long Savine construction. 

10 Two pools of Vaccinia viruses expressing Savine cassettes were both shown to 

restimulate HIV-specific responses from three different patients infected with B clade HIV 
vimses. These results provide a clear indication that the HIV Savine should provide broad 
coverage of the population because each patient had a different HLA pattern yet both pools 
were able to restimulate HlV-specific CTL responses in all three patients against all three 

15 natural HIV proteins tested. Also, both pools were shown to restimulate virtually identical 
CTL patterns in all three patients. This result was unexpected because some responses 
should have been lost or gained due to the amino acid differences between the two pools 
and because Pool 1 is only capable of expressing 2/3 of the full length HIV Savine. There 
are two suggested reasons why the pattem of CTL lysis was not altered between the two 

20 viral pools. Firstly, the sequences in the Savine constmcts are nearly all duplicated because 
the fragment sequences overlap. Hence the loss of a third of the Savine may not have 
excluded sufficient T cell epitopes for differences to be detected in only three patient 
samples against only three HIV proteins. Secondly, while mutations often destroy T cell 
epitopes, if they remain functional, then the CTL they generate frequently can recognise 

25 altemate epitope sequences. Taken together this finding indirectly suggests that combining 
only two Savine constructs may provide robust multiclade coverage. Further experiments 
are being carried out to directly examine the capacity of the HIV Savine to stimulate CTL 
generated by different strains of HIV viras. The capacity of the two HTV-l Savine 
Vaccinia vector pools to stimulate CD4+ T cell HTV-l specific responses from infected 

30 patients was also tested (Figure 20). Both patients showed significant proliferation of 
CD4+ T cells although both pools did not show consistent patterns suggesting that the two 
pools may provide wider vaccine coverage than using either pool independently. 
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The present inventors have generated a novel vaccine strategy, which has been 
used to generate what the inventors believe to be the most effective HIV candidate vaccine 
to date. The inventors have used this vaccine to immunise naive mice. Figure 21 shows 
conclusively that the HIV-1 Savine described above can generate a Gag and Nef CTL 
5 response in naive mice. It should be noted, however, that the Nef CTL epitope appeared to 
exist only in Pool 1 since it was not restimulated by Pool 2. This is further proof of the 
utility of combining HIV-l Savine Pool 1 and Pool 2 components together to provide 
broader vaccine coverage. 

The HIV-1 Savine Vaccinia vectors have also been used to restimulate in vivo 
10 HIV-l responses in pre-immune M nemestrina monkeys. These experiments (Figure 22) 
showed, by INF-y ELISPOT and CD69 expression on both CD4 and CDS T cells, that the 
ability of the HIV-1 SAVINE to restimulate HIV- 1 specific responses in vivo is equivalent 
or perhaps better than another HIV-1 candidate vaccine. 

This is a generic strategy able to be applied to many other human infections or 
cancers where T-cell responses are considered to be important for protection or recovery. 
With this in mind the inventors have begun constructing Savines for melanoma, cervical 
cancer and Hepatitis C. In the case of melanoma, the majority of the cmrently identified 
melanoma antigens have been divided into two groups, one containing antigens associated 
with melanoma and one containing differentiation antigens firom melanocj^es, which are 
often upregulated in melanomas. Two Savine constmcts are presently being constructed to 
cater for these two groups. The reason for making the distinction is that treatment of 
melanoma might first proceed using the Savine that incorporates firagments of melanoma 
specific antigens only. If this Savine fails to control some metastases then the less specific 
Savine containing the melanocj^e-specific antigens can then be used. It is important to 
point out that other cancers also express many of the antigens specific to melanomas e,g., 
testicular and breast cancers. Hence the melanoma specific Savine may have therapeutic 
benefits for other cancers. 

A small Savine is also being constructed for cervical cancer. This Savine will 
contain two antigens, E6 and E7, fi-om two strains of human papilloma virus (HPV), HPV- 
30 16 and HPV-18, directly linked with causing the majority of cervical cancers worldwide. 
There is a large mmiber of sequence differences in these two antigens between the two 
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strains which would normally require two Savines to be constructed. However since this 
Savine is small, the antigen fragments from both strains are being scrambled together. 
While it is normally better for the Savine approach to include all or a majority of the 
antigens from a virus, in this case only E6 and E7 are expressed during viral latency or in 
5 cervical carcinomas. Hence in the interests of simplicity, the rest of the HPV genome will 
not be included although all HPV antigens would be desirable in a Savine against genital 
warts. 

Two Savines have also been constmcted for two strains of hepatitis a major 
cause of liver disease in the world. Hepatitis C is similar to HIV in the requirements for a 

10 vaccine or therapeutic. However, the major hepatitis C strains share significantly lower 
homology, 69-79%, with one another than do the various HIV clades. To cater for this the 
inventors have decided to constmct two separate constructs to cater for the two major 
strains present in Australia, types laand 3a, which together cause approximately 80-95% of 
hepatitis C infections in this country. Both constructs will be approximately the same size 

15 as the HIV Savine but will be blended together into a single vaccine or therapy. 

Overall it is believed that the Savine vaccine strategy is a generic technology 
likely to be appUed to a wide range of human diseases. It is also believed that because it is 
not necessary to characterise each antigen, this technology will be actively applied to 
animal vaccines as well where research into vaccines or therapies is often inhibited by the 
20 lack of specific reagents, modest research budgets and poor returns on animal vaccines. 

EXAMPLE 2 
Hepatitis C Savine 

Synthetic immunomodulatory molecules have also been designed for treating 
Hepatitis C. In one example, the algorithm of Figure 25 was applied to a consensus 

25 polyprotein sequence of Hepatitis C la to facilitate its segmentation into overlapping 
segments (30 aa segments overlapping by 15 aa), the rearrangement of these segments into 
a scrambled order and the output of Savine nucleic acid and amino acid sequences, as 
shown in Figure 26. Exemplary DNA cassettes (A, B and C) are also shown in Figure 26, 
which contain suitable restriction enzyme sites at their ends to facilitate their joining into a 

30 single expressible open reading jBrame. 
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EXAMPLES 
Melanoma Savine 

The algorithm of Figure 25 was also applied to melanocyte differentiation 
antigens (gplOO, MART, TRP-1, Tyros, Trp-2, MCIR, MUCIF and MUCIR) and to 
5 melanoma specific antigens (BAGE, GAGE-1, gpl00In4, MAGE-l, MAGE-3, FRAME, 
TRP2IN2, NYNSOla, NYNSOlb and LAGEl), as shown in Figure 27, to provide separate 
Savine nucleic acid and amino acid sequences for treating or preventing melanoma. 

EXAMPLE 4 

Resolvase Repair Experiment 

10 A resolvase can be used advantageously to repair errors in polynucleotides. The 

following procedure outlines resolvase repair of a synthetic 340 bp fragment in which 
DNA errors were common. 

Method 

The 340 bp fragment was PGR amplified and gel purified on a 4% agarose gel. 
15 After spin purifying, lOul of the eluate corresponding to approximately 100 ng was 
subjected to the resolvase repair treatment. The rest of the DNA sample was stored for later 
cloning as the untreated control. 

2 iiL of lOxPCR buffer, 2 juL of 20 mM MgCl2 and 6 piL of MilHQTM ^ater 
(MQW) and Taq DNA polymerase were added to the 10 juL DNA sample. The mixture 

20 was subjected to the following thermal profile; 95°C for 5min, 65°C for 30min, cooled and 
held at ST'^C. Five /iL of 10xT7 endonuclease I buffer, 8 fiL of 1/50 /iL of T7endoI enzyme 
stock and 17 /xL of MQW were added, mixed and incubated for 30 min. Loading buffer 
was added to the sample and the sample was electrophoresed on a 4% agarose gel. A faint 
band corresponding to the full length fragment was excised and subjected to 15 fiirther 

25 cycles of PGR. The amplified fragment was agarose gel purified and, along with the 
untreated DNA sample, cloned into pBluescript. Eleven plasmid clones for each DNA 
sample were sequenced and the number and type of errors compared (see table) 



wo 01/90197 



- 129- 



PCT/AUOl/00622 



Buffers were as follows: 

lOx T7endonuclease buffer 

2.5ml IM TRIS pH7.8, 0.5ml IM MgCla, 25 /iL 1 M DTT, 50 /iL lOmg/mL BSA, 
2 mL MQW made up to a total of 5 mL. 

5 T7 endonuclease I stock 

Concentrated sample of enzyme prepared by, and obtained from, Jeff Babon (St 
Vincent's Hospital) was diluted 1/50 using the following dilution buffer: 50 /xL 1 M TRIS 
pH7.8, OAiiL IM EDTA pH8, 5 jtiL 100 mM glutathione, 50 ptL lOmg/mL BSA, 2.3 mL 
MQW, 2.5 mL glycerol made up to a total of 5 mL. 

10 Results 

The results are summarised in Tables 2 and 3. 



TABLE 2 



^ r^''^^^ — ^t, ^'-^'^ 'Total Errors' ^fic-''- ^^^^ :l ^ : 


. ^Untreated J}': , -Vv; .'k . 7 


Res61vase:ti;eated \ ^^ . V ^ \\ 


A/T to G/C = 6 


A/TtoG/C-1 


G/CtoA/r-12 


G/CtoA/T = 7 


A/T to deletion = 1 


A/T to deletion = 1 


G/C to deletion = 6 


G/C to deletion == 3 






'^J0^^^d:. < ' "<^;f 




6/1 1 contained deletions 


3/1 1 contained deletions 


9/11 contained mutations 


7/11 contained mutations 
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Clone summary 


Untreated . " 


Resolvase treated 


2/11 correct 


3/11 correct 



Discussion/Conclusion 



While overall the number of correct clones obtained was not significantly 
different, there was a significant difference in the level of errors. This reduction in errors 
becomes more significant as greater numbers of long oligonucleotides are joined into the 
5 one construct i. e, , increasing the difference between imtreated versus treated samples in the 
chance of obtaining a correct clone. It is believed that combining another resolvase such as 
T4 endonuclease VII may further enhance repair or increase the bias against errors. • 

Importantly, this experiment was not optimised e.g,, by using proofreading PGR 
enzymes or optimised conditions. Finally if the repair reaction is carried out during normal 

10 PGR, for example, by including a thermostable resolvase, it is believed that amplification 
of already damaged long oUgonucleotides, and the normal accxmiulation of PGR induced 
errors, even using error reading polymerases during PGR, could be reduced significantly. 
The repair of damaged long oligonucleotides is particularly important for synthesis of long 
DNA fragment such as in Savines because, while the rate of long oligonucleotide damage 

15 is typically <5%, after joining 10 oligonucleotides, the error rate approaches 50%. This is 
true even using the best proofreading PGR enzymes because these enzymes do not verify 
the sequence integrity using correct oligonucleotide templates that exist as a significant 
majority (95%) in a joining reaction. 



20 The disclosure of every patent, patent application, and pubhcation cited herein is 

incorporated herein by reference in its entirety. 

The citation of any reference herein should not be construed as an admission that 
such reference is available as "Prior Art" to the instant application 

Throughout the specification the aim has been to describe the preferred 
25 embodiments of the invention without limiting the invention to any one embodiment or 
specific collection of features. Those of skill in the art will therefore appreciate that, in 
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light of the instant disclosure, various modifications and changes can be made in the 
particular embodiments exemplified without departing from the scope of the present 
invention. All such modifications and changes are intended to be included within the 
scope of the appended claims. 
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WHAT IS CLAIMED IS: 

1. A synthetic polypeptide comprising a plurality of different segments of at least one 
parent polypeptide, wherein the segments are liiaked together in a different relationship 
relative to their linkage in the at least one parent polypeptide to impede, abrogate or 
otherwise alter at least one function associated with the parent polypeptide. 

2. The synthetic polypeptide of claim 1, consisting essentially of different segments of a 
single parent polypeptide. 

3. The synthetic polypeptide of claim 1^ consisting essentially of different segments of a 
plurality of different parent polypeptides. 

4. The synthetic polypeptide of claim 1, wherein the segments in said synthetic 
polypeptide are linked sequentially in a different order or arrangement relative to their 
linkage in said at least one parent polypeptide. 

5. The synthetic polypeptide of claim 4, wherein the segments iu said synthetic 
polypeptide are randomly rearranged relative to their order or arrangement in said at least 
one parent polypeptide. 

6. The synthetic polypeptide of claim 1, wherein the size of an individual segment is at 
least 4 amino acids. 

7. The synthetic polypeptide of claim 6, wherein the size of an individual segment is from 
about 20 to about 60 amino acids. 

8. The synthetic polypeptide of claim 7, wherein the size of an individual segment is 
about 30 amino acids. 

9. The synthetic polypeptide of claim 7, comprising at least 30% of the parent polypeptide 
sequence. 

10. The synthetic polypeptide of claim 1, wherein at least one of said segments comprises 
partial sequence identity or homology to one or more other said segments. 

11. The synthetic polypeptide of claim 10, wherein the sequence identity or homology is 
contained at one or both ends of an individual segment. 
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12. The synthetic polypeptide of claim 1 1, wherein one or both ends of said segment 
comprises at least 4 contiguous amino acids that are identical to, or homologous with, an 
amino acid sequence contained within one or more other of said segments. 

13. The synthetic polypeptide of claim 10, wherein the size of an individual segment is 
about twice the size of the sequence that is identical or homologous to the or each other 
said segment 

14. The synthetic polypeptide of claim 13, wherein the size of an individual segment is 
about 30 amino acids and the size of the sequence that is identical or homologous to the or 
each other said segment is about 15 amino acids. 

15. The synthetic polypeptide of claim 1, wherein an optional spacer is interposed between 
some or all of the segments. 

16. The synthetic polypeptide of claim 15, wherein the spacer alters proteolytic processing 
and/or presentation of adjacent segment(s). 

17. The synthetic polypeptide of claim 16, wherein the spacer comprises at least one 
neutral amino acid. 

18. The synthetic polypeptide of claim 16, wherein the spacer comprises at least one 
alanine residue. 

19. The synthetic polypeptide of claim 1, wherein the at least one parent polypeptide is 
associated with a disease or condition. 

20. The syntiietic polypeptide of claim 1, wherein the at least one parent polypeptide is 
selected from a polypeptide of a pathogenic organism, a cancer-associated polypeptide, an 
autoimmune disease-associated polypeptide, an allergy-associated polypeptide or a variant 
or derivative of these. 

21. The synthetic polypeptide of claim 1, wherein the at least one parent polypeptide is a 
polypeptide of a virus. 

22. The synthetic polypeptide of claim 21, wherein the vims is selected from a Hxmian 
Immunodeficiency Virus (HIV) or a Hepatitis virus. 

23. The synthetic polypeptide of claim 22, wherein the virus is a Human 
Immunodeficiency Virus (HIV) and the at least one parent polypeptide is selected from 
env, gag, pol, vif, vpr, tat, rev, vpu and nef, or a combination thereof. 
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24. The synthetic polypeptide of claim 1, wherein the at least one parent polypeptide is a 
cancer-associated polypeptide. 

25. The synthetic polypeptide of claim 24, wherein the cancer is melanoma. 

26. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanocyte differentiation antigen. 

27. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanocyte differentiation antigen selected from gplOO, MART, TRP-1, Tyros, TRP2, 
MCIR, MUCIF, MUCIR or a combination thereof 

28. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanoma-specific antigen. 

29. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanoma-specific antigen selected from BAGE, GAGE-1, gpl00In4, MAGE-1, MAGE- 
3, FRAME, TRP2IN2, NYNSOla, NYNSOlb, LAGEl or a combination thereof 

30. A synthetic polynucleotide encoding a synthetic polypeptide comprising a plurality of 
different segments of at least one parent polypeptide, wherein the segments are linked 
together in a different relationship relative to their linkage in the at least one parent 
polypeptide to impede, abrogate or otherwise alter at least one function associated with the 
parent polypeptide. 

31. A method for producing the synthetic polynucleotide encoding a synthetic polypeptide 
comprising a plurality of different segments of at least one parent polypeptide, wherein the 
segments are linked together in a different relationship relative to their linkage in the at 
least one parent polypeptide to impede, abrogate or otherwise alter at least one fiinction 
associated with the parent polypeptide, said method comprising: 

- linking together in the same reading frame a plurality of nucleic acid sequences 
encoding different segments of the at least one parent polypeptide to form a synthetic 
polynucleotide whose sequence encodes said segments linked together in a different 
relationship relative to their linkage in the at least one parent polypeptide. 

32. The method of claim 31, further comprising fragmenting the sequence of a respective 
parent polypeptide into fragments and linking said fragments together in a different 
relationship relative to tiieir linkage in a respective parent polypeptide sequence. 
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33. The method of claim 32, wherein the fragments are randomly linked together. 

34. The method of claim 31, further comprising reverse translating the sequence of a 
respective parent polypeptide or a segment thereof to provide a nucleic acid sequence 
encoding said parent polypeptide or said segment. 

35. The method of claim 34, wherein an amino acid of a respective parent polypeptide 
sequence is reverse translated to provide a codon, which has higher translational efficiency 
than other synonymous codons in a cell of interest. 

36. The method of claim 35, wherein an amino acid of said parent polypeptide sequence is 
reverse translated to provide a codon which, in the context of adjacent or local sequence 
elements, has a lower propensity of forming an undesirable sequence that is refractory to 
the execution of a task. 

37. The method of claim 35, wherein an amino acid of said parent polypeptide sequence is 
reverse translated to provide a codon which, in the context of adjacent or local sequence 
elements, has a lower propensity of forming an xmdesirable sequence selected from a 
palindromic sequence or a duplicated sequence, which is refractory to the execution of a 
task selected from cloning or sequencing. 

38. The method of claim 31, further comprising linking a spacer oligonucleotide encoding 
at least one spacer residue between segment-encoding nucleic acids. 

39. The method of claim 38, wherein spacer oUgonucleotide encodes 2 to 3 spacer 
residues. 

40. The method of claim 38 or claim 39, wherein the spacer residue is a neutral amino acid. 

41. The method of claim 38 or claim 39, wherein the spacer residue is alanine. 

42. The method of claim 31, further comprising linking in the same reading frame as other 
segment-containing nucleic acid sequences at least one variant nucleic acid sequence 
which encodes a variant segment having a homologous but not identical amino acid 
sequence relative to other encoded segments. 
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43. The method of claim 42, wherem the variant segment comprises conserved and/or non- 
conserved amino acid differences relative to one or more other encoded segments. 

44. The method of claim 43, wherein the differences correspond to sequence 
polymorphisms. 

45. The method of claim 44, wherein degenerate bases are designed or built in to the at 
least one variant nucleic acid sequence to give rise to all desired homologous sequences. 

46. The method of claim 31, further comprising optimising the codon composition of the 
synthetic polynucleotide such that it is translated efficiently by a host cell. 

47. A synthetic construct comprising a synthetic polynucleotide encoding a synthetic 
polypeptide comprising a plurality of different segments of at least one parent polypeptide, 
wherein the segments are linked together in a different relationship relative to their linkage 
in the at least one parent polypeptide to impede, abrogate or otherwise alter at least one 
function associated with the parent polypeptide, wherein said synthetic polynucleotide is 
operably linked to a regulatory polynucleotide. 

48. The synthetic constmct of claim 47, further including a nucleic acid sequence encoding 
an immunostimulatory molecule. 

49. The synthetic construct of claim 48, wherein the immunostimulatory molecule 
comprises a domain of an invasin protein (Inv). 

50. The synthetic constract of claim 48, wherein the immunostimulatory molecule 
comprises the sequence set forth in SEQ ID NO: 1467 or an immune stimulatory 

. homologue thereof. 

51. The synthetic constract of claim 48, wherein the imnixmostimulatory molecule is a T 
cell co-stimulatory molecule. 

52. The synthetic constract of claim 48, wherein the immunostimulatory molecule is a T 
cell co-stimulatory molecule selected from a B7 molecule or an ICAM molecule. 

53. The synthetic constract of claim 48, wherein the immimostimulatory molecule is a B7 
molecule or a biologically active fragment thereof, or a variant or derivative of these. 
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54. The synthetic construct of claim 48, wherein the immixnostimulatory molecule is a 
cytokine selected from an interleukin, a lymphokine, tumour necrosis factor or aa 
interferon. 

55. The synthetic construct of claim 48, wherein the immunostimulatory molecule is an 
immunomodulatory oUgonucleotide. 

56. An immunopotentiating composition, comprising an immunopotentiating agent 
selected from the synthetic polypeptide of claim 1, the synthetic polynucleotide of claim 30 
or the synthetic constmct of claim 47, together with a pharmaceutically acceptable carrier. 

57. The composition of claim 56, further comprising an adjuvant. 

58. A method for modulating an inmnme response, which response is preferably directed 
against a pathogen or a cancer, comprising administering to a patient in need of such 
treatment an effective amount of an immunopotentiating agent selected from the synthetic 
polypeptide of claim 1, the synthetic polynucleotide of claim 30, the synthetic construct of 
claim 47, or the composition of claim 56. 

59. A method for treatment and/or prophylaxis of a disease or condition, comprising 
administering to a patient in need of such treatment an effective amoimt of an 
immunopotentiating agent selected from selected from the synthetic polypeptide of claim 
1, the synthetic polynucleotide of claim 30, the synthetic constmct of claim 47, or the 
composition of claim 56. 

60. A computer program product for designing the sequence of a synthetic polypeptide 
comprising a plurality of different segments of at least one parent polypeptide, wherein the 
segments are linked together in a different relationship relative to their linkage in the at 
least one parent polypeptide to impede, abrogate or otherwise alter at least one function 
associated with the parent polypeptide, said program product comprising: 

- code that receives as input the sequence of said at least one parent polypeptide; 

- code that fragments the sequence of a respective parent polypeptide into 
fragments; 

- code that links together said fragments in a different relationship relative to their 
linkage in said parent polypeptide sequence; and 
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— a computer readable medium that stores the codes. 

61. The computer program product of claim 60, further comprising code that randomly 
rearranges said fragments. 

62. The computer program product of claim 60, further comprising code that links the 
sequence of a spacer residue to the sequence of said at least one parent polypeptide or to 
said fragments. 

63. A computer program product for designing the sequence of a synthetic polynucleotide 
encoding a synthetic polypeptide comprising a pliurality of different segments of at least 
one parent polypeptide, wherein the segments are linked together in a different relationship 
relative to their linkage in the at least one parent polypeptide to impede, abrogate or 
otherwise alter at least one function associated with the parent polypeptide, comprising: 

— code that receives as input the sequence of at least one parent polypeptide; 

— code that fragments the sequence of a respective parent polypeptide into 
fragments; 

— code that reverse translates the sequence of a respective fragment to provide a 
nucleic acid sequence encoding said fragment; 

— code that links together in the same reading frame each said nucleic acid 
sequence to provide a polynucleotide sequence that codes for a polypeptide sequence in 
which said fragments are linked together in a different relationship relative to their 
Hnkage in the at least one parent polypeptide sequence; and 

— a computer readable medium that stores the codes. 

64. The computer program product of claim 63, further comprising code that randomly 
rearranges said nucleic acid sequences. 

65. The computer program product of claim 64, further comprising code that reverse 
translates an amino acid of a respective parent polypeptide sequence to provide a codon, 
which has higher translational efficiency than other synonymous codons in a cell of 
interest. 

66. The computer program product of claim 63, further comprising code that reverse 
translates an amino acid of a respective parent polypeptide sequence to provide a codon 
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which, in the context of adjacent or local sequence elements, has a lower propensity of 
forming an undesirable sequence that is refractory to the execution of a task. 

67. The computer program product of claim 63, further comprising code that links a spacer 
oligonucleotide to one or more of said nucleic acid sequences, 

68. A computer for designing the sequence of a synthetic polypeptide comprising a 
pluraUty of different segments of at least one parent polypeptide, wherein the segments are 
linked together in a different relationship relative to their linkage in the at least one parent 
polypeptide to impede, abrogate or otherwise alter at least one function associated with the 
parent polypeptide, wherein said computer comprises: 

(a) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said machine-readable data comprise the 
sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

(c) a central-processing xmit coupled to said working memory and to said machine- 
readable data storage medium, for processing said machine readable data to provide said 
synthetic polypeptide, sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polypeptide sequence. 

69. The computer of claim 68, wherein the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments and 
linking together said fragments in a different relationship relative to their linkage in the 
sequence of said parent polypeptide. 

70. The computer of claim 68, wherein the processing of said machine readable data 
comprises randomly rearranging said fragments. 

71. The computer of claim 68, wherein the processing of said machine readable data 
comprises linking the sequence of a spacer residue to the sequence of said at least one 
parent polypeptide or to said fragments. 



wo 01/90197 PCT/AUOl/00622 

-143- 

72. A computer for designing the sequence of a synthetic polynucleotide encoding a 
synthetic polypeptide comprising a plurality of different segments of at least one parent 
polypeptide, wherein the segments are linked together in a different relationship relative to 
their linkage in the at least one parent polypeptide to impede, abrogate or otherwise alter at 
least one function associated with the parent polypeptide, wherein said computer 
comprises: 

(a) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said machine-readable data comprise the 
sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

(c) a central-processing xmit coupled to said working memory and to said machine- 
readable data storage medium, for processing said machine readable data to provide said 
synthetic polynucleotide sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polynucleotide sequence. 

73. The computer of claim 72, wherein the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments, 
reverse translating the sequence of a respective fragment to provide a nucleic acid 
sequence encoding said fragment and linking together in the same reading frame each said 
nucleic acid sequence to provide a polynucleotide sequence that codes for a polypeptide 
sequence in which said fragments are linked together in a different relationship relative to 
their linkage in the at least one parent polypeptide sequence. 

74. The computer of claim 72, wherein the processing of said machine readable data 
comprises randomly rearranging said nucleic acid sequences. 

75. The computer of claim 72, wherein the processing of said machine readable data 
comprises reverse translating an amino acid of a respective parent polypeptide sequence to 
provide a codon, which has higher translational efficiency than other synonymous codons 
in a cell of interest. 
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76. The computer of claim 72, wherein the processing of said machine readable data 
comprises reverse translating an amino acid of a respective parent polypeptide sequence to 
provide a codon which, in the context of adjacent or local sequence elements, has a lower 
propensity of forming an undesirable sequence that is refractory to the execution of a task, 

77. The computer of claim 72, wherein the processing of said machine readable data 
comprises linking a spacer oligonucleotide to one or more of said nucleic acid sequences. 
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MUTATED AAS N K R A SA D K NE 

Q 

SQIIEELIKKEKVYLSOTPAHKGIGGNBQVDKIJVISGIRKVLFLDGINKAQEEHERYHSNWRT^^ 



ISOLATE-E 

CONSENSUS -A 

CONSENSUS-B 

ISOIiATE-C 

CONSENSUS ~D 

CONSENSUS-O 

CONSENgUS-U 

CONSENSUS- CPZ 



NQIIEKIiI?K?KVYLSWVPAHK6IGGNEQVDKIjVS?GIRKVLFLDGIDKAQE?HE?YH?NW?AMASDFNL 

s q--K-E a a e--K--s--r 

Q--S-ER S E--K--S--R NS-'I 

s Q--K-E A Q E--K--N--R 

Q •.E_xK-E?---T KI KD--R E Q D--K--S L--?-G- 

Q--Q-D S E--K--S--R 

- - ? ? ? ? S ?? - ?.? ? 



DESIGNED SEQ PPIVAKEIVANCDKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKVILVAVHVASGYXEAEVI PAETGQETA 
MUTATED AAS P S IN I 

C 
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464 

462 
419 
463 
329 



484 
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531 
479 
532 
367 



550 
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600 
541 
602 
416 
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672 

670 
602 
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459 
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742 

740 
669 
742 
510 
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ISOIiATE-C 
CO»?SENSUS-D 
CONSENSUS-O 
COMSEWSUS-U 
CONSENSUS - CPZ 



-1- 



. — A- 

??M — A- 



-GIQ--- 
--GIK-- 
-??IQH- 



D- 

D- 



880 
798 
882 
631 



DESIGNED SEQ AEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIIDIIATDXQTKELQKQITKIQNFRVYYRDSRDPIWKGP 



MUTATED AAS 

ISOXATE-E 

CONSENSUS -A 
CONSENSUS - B 
ISOLATE- C 
CCWSENSUS-D 
CONSENSUS-O 
CONSENSUS -U 
CONSENSUS - CPZ 



R V S N 

AEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIIDIIATDIQTKELQKQITKIQNFRVYYRDSRDPIWKGP 

AEHLKTAVQMAVFIHNFKRKGGZGGYSAGERIIDIIA?DIQTKELQKQI?KIQNFRVyYRDSRDPIWKGP 
V 1 T 1--^- 



'Is-?N-- 



-N- 



880 
952 

950 
865 
952 
687 



vif cds -> 

DBSIGNSD SEQ AKlil.WKGEGAVV3QDNSDIKWPRRKAKIXRDYGKQMAGDDCVAGRQDED 
MUTATED AAS A S 

AKIiLWKGEGAWIQDNSDiKWPRRKAKI IRDYGKQMAGDDCVAGRQDED 



ISOLATE -E 

CONSENSUS -A 
CONSENSUS- B 
ISOIiATE-C 
CONSENSUS ~D 
CONSENSUS-O 
CONSENSUS-U 
CONSENSUS - CPZ 



AKIiLWKGEGAVVIQDNSDI KWPRRKAKI IRDYGKQMAGDDC? AGRQDED 

V-S r 

A- -V 

V V-S 

_Q KG T-SM-N- -T-SESMEQPGEIP 



-QGEL- 



-V--G--- 
-V~S--N- 



-KHGTAW 
-KHGTAW 



929 
1002 

1000 
925 

1008 
742 



CONSENSUS A- CPZ FROM LOS ALAMOS HIV SEQUENCE DATABASE 
ISOLATE-C FROM 6ENBANK U46016 HIV-1 SUBTYPE C (ETHIOPIA) 

ISOLATE-E FROM GENBANK U51189 HIV-1 SUBTYPE E ISOLATE 93TH253 (THAILAND) 
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<- pol cds 

DESIGNED SEQ ME^W . Q . VMIWQVDRMRIRTWNSLVKHHMYISKKAKGWFYRHOT^ 

MUTATED AAs I* KK HN FDR ^.^ 

ISOLATE-E MENRW . Q . VMlVWQVDl^RIRTWNSIiVKHHMYISKKAKQWFYRHHYESQHPKVSSEVHIPLGE . - ARLVI 

CONSENSUS-A MENRW . Q . VMIWQVDRMrlRTWNSLVKHHMYVSKKAkGWFYRHHf EsRHpkvsSEVHIPLGd . . ARBVV 66 

rnxrqFNSUS-B "^^ • ^ ^"-9 y^-t: — ra - i 66 

ISOLATE-C MENRW Q VLIVWQVDRMKIRTWNSLVKHHMHISRRANGWVYRHHroS^^ ARI-U " 

DESICaSTED SEQ RTYWGLQTGEKDWQLGHGVSIEWRQKRYSTQVDPDrJUDQLimQYFDCFSDSTlRRAILGQI^ 
MUTATED AAsK HRHQ IiS GHH AA HRS Q 

K Y 

ISOIATE.-E RTYWGLQTGEKDWQLGHGVSIEWRQKRYSTQIDPDI^QLIHLQYFDCFSDSTIRRAILGQVVRRRCEYP 
CONSENSUS-A RTYWGLHTGErDWHLGhGVSIEWrgKRYSTQvDPDI^qLIHiaiYFdCFSdSAIRkAILGeiWPRCEYQ 136 

CONSENSUS-D k """"Zl'''^"' ;^rr"rr2iII^-I^--T--Tf - - ?- - --- -QR-L^^ • 118 

CONSENSUS-CPZ T??-?-?? ? ? — ?G?-? ?T — 

vpr cds -> 

DESItaJED SEQ SGHNKVGSI<}YIJVr.-KAI.. . . ITPKKIIU'PLPSVKKLTEDRWNKPQKIKGHREtraT^ 
MUTATED AASA T KK KBTRG 

ISOIATE-E SGHKKV6SLQYIA1. . KAL . - . TTPKRIRPPI.PSVKKLTEDRWNKPQKIKGHRENPTMNGH$ 

CONSENSUS-A AGHNKVGSLQYIJ^.lcAI,. . .VaPtkaKPPI.PSvkKI,tEDRWliePQKTRGHRGsR?inNgH$ 191 

rnNC!EMSUS-B -a — . . .it-k-i ? K K nt 

ISOLATE- C AGHNKVGSLQYXAL TAL IKPKKAKPPLPSVSKLVEDKWNKPQKTRGRRGNHTMNGH ' 

rn™nro r-sQ::;::;:::-':v- • ■fK";':::::::5;:::::::K;;;i-DQl;:^""-- isi 
coSiusicPz k..r...-ph».i^- 
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<- vif cds 
.oligomerization 



->/ 



IiR domain 
/<- 



.DESIGNED SEQ MEQ' AP EDQGPQREPyNEWAIiELLEELKQEAVRHFPRPWIiHNIiGQYIYETYGDTWSGVEALIRTLQQL 

SS T H G H EI 

N S 

MEQ AP EDQGPQREPyWEWAIjEIjLBELKQEAVRHFPRPWLHNLGQYIYETyGDTWSGVEALIRTLQQL 



MUTATED AAs 
ISOIiATE-E 



CONSENSUS -A 
CONSENSUS -B 
ISOliATE-C 
CONSENSUS -D 
CONSENSUS -O 
CONSENSUS-U 
CONSENSUS -CPZ 



ME? - .AP.EDQ6PQREP??E??LEIjIiEEIiKHE?VRHFPR?WLHGLGQHIY?TYGDTWEGV?AIIRILQQI, 



-yN-Wt- 



-E- 



MEQ AP EDQSSQREPYNEWTLELLEELKNEAVRHFPRPWLHGLGQYIYIONYGDTWEGVEAI IRILQQL 

--Q. . . YN-Wt S-A 1 S ? — E ?--E-? 

. . - - .-n---a-~-fN-Wt ?-A p a y--E m 

--Q. . — .A HN~WT Q~A --I S E E S 

--Q. .--.?-? — ? W T ?-N-A ???-????-???-?-???????-??????-?? 

LR dOTnain ->/ tat cds -> 



IDESIGNED SEQ MFIH FRIGCQHSRIGIL 
•MUTATED AAs L V R I 

f T 
ISOIiATE-E MFIH FRIGCQHSRIGIL 



RQRRA RNGASRS 
G' S 

RQRRA RNGASRS 



CONSENSUS -A 
CONSENSUS -B 
ISOIiATE-C 
CONSENSUS -D 
CONSENSUS -O 
CONSENSUS -U 



LF?H. FRIGCQHSRIGIL - - ?GRRG . RNGA?RS$ 

--i-? r 1- . .-q--a? S 

liFVH FRI6CQHSRIGIF AREKRQEWSW 

--I-. b.. .RQ--A. SS~- 

--t - -y ????^rg~-r SS-- 

-■-I-, T. . -RQ--A. SS-- 



58 
65 

64 
66 
67 
33 



CONSENSUS-CPZ ??I- 



O -7 - ^ O - 



L- - . PQ--R-S — SN — 



84 
93 

S3 
94 
96 
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I intramolecular 3*sj 3*s3 

I disulfide bonding \/ \/ 

I I I rev "cds. ->/<- nls 

IdESIGNED SEQ MDPVDPNIiEPWNHPGSQPTTACSKCYCKKCCFHCQIiCFIiKKGIiGISHGRKKR KQRRGAPQSRICDHQYP 

IMUTATED AAs KK KT YVT Y RRSE 

i N . Q 

I ISOliATE-E MEIiVDPNIiBPWNHPGSOPTTACSKCYCKKCCWHCQIiCFr.KKGLGISHGRKKR KHRRGTPQSRKDHQYP 

f 

^CONSENSUS-A M?PVDPnLEPWnHPGSqPtTaCskCYCK?CCwHCqlCFIinKGLGISYGrKKR i . r?RRgt:PQs?kDhOnp 64 

icONSENSUS-B -e r k k tn k — f v--tt . . -Q — ra--dSqt--vs 68 

ICONSENSUS-C ? K fc k-sY--lV--qt .-q--sa-?-SE 65 

CONSENSUS-D -d ? -p-N- -h--K--y v--it . . -Q--rp~-ggQa--?- 66 

CONSENSUS-F -EL D P-T R--F W--TT .KQ-HR SQI--DL. 68 

CONSENSUS-O -D E7P--H ?-Q?P-NN R-'Y--YV--?? ? . . - ???AAA- -P- ?KD- 55 

CONSENSUS-U -D K K T K— Y-^PV .-P--RS--NSE 68 

CONSENSUS-CPZ -D-?-????--?--???-?-?-NN Y--??--TK ?-? ?? — -T? ???S?NN-D? 45 

escoii \/ exon 

ipESIGNBD SEQ IPBQPIjPQTRGGNPTDPKESKKEVASKTETDPCD 

MUTATED AAS S SPD GE KEA F 

ISOLATE ~B IPEQPLPIIRGGNPTDPKESKKEVASKAETDPCD ^ 

CONSENSUS - A ipKQplPqtqg? ?pt9pkESkKkVeSKt eTDrf ? $ 95 

'CONSENSUS-B Ls ?s-pr-D. rE P?d? 99 

,CONSENSUS-C -S r-d. E p-D- 98 

CONSENSUS-D SS-pR-d ? A---p-Dw$ 99 

CONSENSUS-F V IS-AR-N . ? E A? ?-P?--$ 96 

CONSENSUS-O V-?-S???-?RK.Q?RQE-QE??--K??GP?G?P????SC??CTR?S?Q$ 83 

CONSENSUS-U S~-H--RV.S E E A-----D- ' 101 

CONSENSUS-CPZ ??-??-?????-- ?????K??-?-??--?????-? 52 
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high-affinity 
binding site 
nls 

\/ 3 ' sj exon \/ exon /<- ->/* 

DESIGNED SEQ MMRSGSTDE ELL RAVRIINIIiYQSNPYPSSEG TRQTRKNRRRRWRARQRQIRAISERILSTCLGRS 
MUTATED AAs D KIK SAR E.HSWNFP 

N 

ISOLATE- E MAGRSGSTDE ELL RAVRIINIL.yQSNPyPSSEGGTRQTRKNRRRRWRARQRQIRAISERILSTCLGRS 

CONSEKFSUS-A MAgRSG?sDE . eLL . KAiRIIKiLYQSNPyPkPkG . SRQARKNRRRRWRARQRQIDSlSeRILStCLGRP 66 

CONSENSUS-B d . .-tV-l--f p-s-e-.T R e r-i^ — w y s 67 

ISOLATE-C MAGRSGDSDB ELL KAVRIIKILYQSNPYPTPEG TRQARRNRRRRWRARQRQIHTLSERILSNFLGRP 

CONSENSUS-F N-?T. .R-?-Y E-.T R ?-R??-? S SX 

CONSENSUS-O E- . - . .Q?-?Q--Q : ?-?-?-. R--A-V-?-A?-?-A-VVHG? 5S 

CONSENSUS - U DA--. . RW P-E-.T--T RAI F S 67 

COMSENSUS-CPZ ?E-??????-??-VK ? ?-?- . ?-?--R-? ?? — ?-? ?????-V-?-? ? 41 

Leu- rich 
effector domain 
/<- ->/ 

DESIGNED SEQ AEPVPLQLPPLERLHLDCSEDCGTSGTQQSQGTETGVGRPQISGESSVILGPGTKN 
MUTATED AAs N SD N L AV S 

S 

ISOLATE-E TEPVPLQLPPLERLHLDCSEDCGTSGTQQSQGTBTGVGRPQISGESSVILGPGTKN 

CONSENSUS--A AEPVPLQLPPlERLhLDCsEdcgTSgTQq?qg?etGVGrpQvsVEssavLGSGTkn 120 

CONSENSUS-B 1 ? ? s — il p e E$ 115 

ISOLATE- C AEPVPliQLPPLERLKLDCSEDSDTSGTQQSQGTTEGVGNP PREMATURE TRUNCATED 

CONSENSUS-F B ? ?IN? — ?-E.Q-A?E S--T-6--H E$ 105 

CONSENSUS-O Q?NN?VD Q-?IRDP-?D?L????TVDPRAEDN$CL-NLCSCNT????????N$ 95 

CONSENSUS-U 1 C G P--T S-PI-G TI E$ 123 

CONSENSUS-CPZ PK-GD-E — E-DK-S-Q-V-TTQDV--SNTSQPO-AT-ETVPAGGNYSI--K-A — 97 
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env cds -> 
phos I I phos 



DESIGNED SEQ MTPL EIIAIVAFIVALIIAIWWTIAYI EYRKIiliRQR RIDRL IKRTRBRA EDSGNES 

MUTATED AAs I* I» VF KK K EI 

CONSENSUS - A mt PL? ? ? elcAIvGLiVALILAI WWTI Vgl . eyKkl Ikqr Kidrl ?i JcRIrERA . EDSgNES 57 

crONSENSUS-B -qs- q-? a-v--a-i f-?~-r-i-R-- ? d . 56 

ISOItATE-C MVDtiIjAKVDYRIVIVAFIVAI.IIAIVVWTIAyi EYRKLLRQR RIDRI* IKRTRERA EDSGNES 

CONSENSUS -D -Q-- v-1 A-v i f - . -crr-kr-- w-.-d ? 57 

CONSENSUS-F -S?? IiAIS?TA 1 ?Y- , — R---R-- - -N- - . YE?- - . 51 

CONSBNSUS-O -H?? ?LI*-?I??SAL??INV? ? - ? . . F? . .IiR?Y-?-??QDR?E?E-IjER.LR--?-IR.D--DY-- 42 

CONSENSUS-U -Q-- T-T V--F-A S--Y-. — R-IR--K .LD . 57 

CONSENSUS-CPZ --?? ?????I,???????W?-CI???I????-??yK??? ,??????-?.??!?????-??????- 14 

DESIGNED SEQ EGDTEE IiSTM VDM GNYDLGVDNNL 
MUTATED AAs R AL 

CONSENSUS-A ?GDT?E . Ii?kL . . . . VEM .GnydlgvdnNIi$ 78 

CONSENSUS-B e- -qe - . -sa-? ? ? ? ? -H?apwdvdD- - 79 

ISOLATE- C . DGDTEE LSTM VDM GNIiRLliDVNDI* 

CONSENSUS-D E--rE- . -sa- ... . -HhAPwd?Ddin- 80 

CONSENSUS-F E--AE---A?- G- - . -PFIP-DI ? 73 

CONSENSUS-O N?EE-QEVM?-.' ??SH-F?NPM . FE?? 59 

CONSENSUS-U D E- * -ST- . . . -M-- . -YEYILDND 81 

CONSENSUS-CPZ '-?EE-~??-???????????FANP? - ????DE 23 
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<- vpU cds 
signal peptide / gpl20 



DESIGNED SEQ 
MUTATED AAS 

CONSENSUS -A 
CONSENSUS -B 
CONSENSUS -C 
CONSENSUS -D 
CONSENSUS -E 
CONSENSUS -F 
CONSENSUS -G 
CONSENSUS -O 
CONSENSUS -U 
CONSENSUS -CPZ 



MRVKETQMNWPNIj WK 
R 

M3rvmgiq?nyq?l . wr?? . - . 

P? k--rk h-? ??? 

r--?r-w~qw.-i 

r?-er— -h- . ??? 

Ket-m-wpn- . -k-- . . . 

-?-R-M-R~W-H- .GK 

-?-k r-W-H- - -k 

- 1 - 1 MKaM ? KrNr . Kl 

-?-?E?-R- ??-? - -? 

-??????-???-? .??--?- - 



W GTblLGDVIIC SA SD NLWVTVYYGVPVWRDADTTLFCAS 
M M M E E T 



qtmilci'?'5iIc.na??e.?l"WVtVyY6VPVWkdaeTT3jfcAS 49 

11. [---l_-mlm- - -s . e-t 53 

. . - , -iLGFwmlm-- .-v--g.n e-k 53 

. ri---TnLM-- .sv.a?? E-t 52 

. Iv ?s- -Sd.N r — d — ^ 55 

. .-.-LLF--iIi .--.--.n e-T 53 

. LV .s-.sn.n E--D 54 

. . -?lylaniALi-P- .LS . -??Q-yA---s E— ?Pv 51 

___^77? -p-.- , ?^ , 36 

! I??????-?--??? -?T. - -.-?? ? ??-?P? ? 19 



DESIGNED SEQ DAKAHETEVHNVW ATHACVPTDPNPQfilHLE NVTENFNMWKNNMVEQMQEDVISLWD QSLKPCVKLT 
MUTATED AAS ^ ^ D D H I 



CONSENSUS -A 
CONSENSUS-B 
CONSENSUS -C 
CONSENSUS -D 
CONSENSUS -E 
CONSBNSUS-F 
CONSENSUS -G 
CONSENSUS -O 
CONSENSUS -U 



YD 

ciAkAydtE?HWVW?aTHaCVPTDPnPqEi?le 



-s-k?-a- 
--He--v- 
-S-Ek-v- 



--NL.TS~-q — I- 
p_p? 



CONSENSUS-CPZ ?-???S 



„SQ --?-?-yp-? 

p ? ?--? 

p — p-ppv--? 



NVTE?FnmwkNnMVeQniheDiiSLWD . qSLkPCvkLt 

n • 

n d--d . 

N 

n q — V ? 

n-d T 



.E- 



. — d 1 Y--d 



. •? p? — ?- 



. qM- 

.? 

-???-?? . 



113 
119 
119 
117 
121 
120 
120 
114 
91 
56 



DESIONED SEQ PLCVTLNCTWANLINVN 
MUTATED AAs 



HYPERVARIABLE REGIONS l/2 
CONSENSUS-B n- - td ?-? 



.m 



ji- . t 

n- - tna- 



CONSENSUS-B 
CONSENSUS- C 
CONSENSUS -D 
CONSENSUS- E 

CONSENSUS -F • ri~?t-i 

CONSENSUS -G n- . t - ■ 

CONSENSUS -O F QMn- - td:- 

CONSENSUS - U n-.t-" 

CONSENSUS-CPZ -?-??? — 



...__-.-?? 
?????????? 
???? ? 

?tLkE 

?-? NcT- - ? en- - nNs tv- ??? 



-t ?- 

-i-nvsniig-it . 



126 
133 
132 
131 
150 
139 
143 
129 
105 
60 



DESIGNED SEQ 
MUTATED AAs 

CONSENSUS -A 
CONSENSUS-B 
CONSENSUS-C 
CONSENSUS-D 
CONSENSUS-E 
CONSENSUS -F 
CONSENSUS -G 
CONSENSUS -O 
CONSENSUS -U 
CONSENSUS-CPZ 



? - . ?e 
e??g- 
-? . — 
-?_-g 
.d- 
eP.ga 
- - . e- 
. . .n~ 
_pp_p 



HYPERVARIABLE REGIONS l/2 



ikNCsf NmTt e 1 rdkkqkvy sLf YjrlDvVqi 



T r*.'? •? •? 



A --i-pl 

kq-ha k 

hA k--i 

•-Q?--Ha I-p-s . 

•--ktE-A k p-n . 

.p_-?__V--V-k E-KQA Vs~L?k?N-'ts- 





Vr 

-Q V- 

in i i- 

m 



I ?- 



-kfr 



_k P-n . . 



. s 

. ~ -ns- - 
. - ?ss- . 
. .--T-- 
. . -n 



•y-}'?^^-^'?'?^'^'^'^ ??????--??-????--.. - 



-S-- - ■ 



--T 



-sd 



160 
169 
166 
165 
185 
177 
182 
164 
137 
73 



DESIGNED SEQ YRLINCNTSVIKQACPICVSFDPIPIHYCAPAGYAII.KCNDKNFNGTGPCKNVSSVQCTHG 
MUTATED AAs sATITE F NK TT R 



IKPWSTQIi 
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CONSENSUS - D 
CONSENSUS-E 
CONSENSUS-F 
CONSENSUS-G 
CONSENSUS -O 
CONSENSUS -U 
C0NSENSUS-CP2 



-????- 



V-K-- 

T- 

-V-T-K-- 

-STt-? 

.--?-k 

. -T? ?-?-?? 



-t 

i--D- 
--Wd- 



-y- 

-Y- 



-n- 
-N- 
-N- 
-r- 



-k- 
-n- 
-k- 
-n- 



--l-?-itV-T 



n--K 

7 •? ?D- ?-?-?-? ?"H 



T 



<- V3 neutralization loop 



234 
254 
245 
251 
228 
205 
120 



DESIGNED SEQ LBNGSIAEE EIIIRSENLTNNAKTIIVHLNESVEINCTRP NNNTR K HYPERVARIABI.E REGION 
^^^■^ F D V Q K V ST 



MUTATED AAs 

CONSENSUS -A 
CONSENSUS -B 
CONSENSUS -C 
CONSENSUS -D 
CONSENSUS-E 
CONSENSUS -F 
CONSENSUS -,G 
CONSENSUS -H 
CONSENSUS -O 
CONSENSUS -U 



y? 



VV 

LBnGSIAe ? ? ? v?lrseni tnNakt i iVql ? ?pV? met RP . nnntr . ks ? ? ?yr i? ? ?gpGq? ? af ya . 

e.e-v --f-d nes-e ? • •^'^ ^' • t- 

, eii 1 V h-n-s-e-v - - -x- 

--E.Eil 1 ? 

e . eXi li h-NKsre 

e . dii q--sd h-Nes-q 

e . el-^ ?-d V nksie-? 

_ ? D-T-N K 

I T-Skg.kIr-Mgk--?dsg-N T-'N-?i-mt-e-- 

E.E-i 



d- 



net-k- 



CONSENSUS-CPZ -?-????-<-?-?????K?????V?????-E??-??-?- 
V3 neutralization loop ~> 



. . t . 

-?--. .-1-t? 

. .v--r. 

---r- . . 



qr- .-tp- 
t-. - -it- 

I?- 

— . ..I?f ... 

? .1?- . .?-?- 

Qe. - -i?-. - - — .m- .-W-S 



"^7 . 



-?Q- 



-M. .T--W. 

CD4 



3/4/5 



279 
29ff 
291 
288 
312 
302 
305 
39 
279 
261 
142 



DESIGNED SEQ 
MUTATED AAs 

CONSENSUS -A tgdi 
CONSENSUS -B 
CONSENSUS ~C 
CONSENSUS -D 
CONSENSUS-E 
CONSENSUS-F 
CONSENSUS-G 
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Full length -17000 bp 



T 



^ 



Xbal*BamHI Sail/Xhol destroyed Xhol intact BglllEcoRISall* 

Cassette A -5600 bp Cassette B ~5600bp Cassette C ~5800bp 

Xhol . \Xhoi \ Xhol BglllEcoRISall* 



Xbal*BamHI 



Sail 



BglllEcoRIXhol* Xbal*BamHI BglllEcoRISall* XbarBamHI 



Full length construction after cloning the cassettes into pBS- 
Sites marked with a are in the pBS MCS 



Cassette Extras (Can be removed from cassette ends) 



A (37bp) BamHI/Kozak Start 

5' gc ggatccacc atg 

B (43bp) BamHI/Kozak Start Xhol 

5' gc ggatccacc atg ctcgag... 

C (aybp) BamHI/Kozak Start Xhol 

5' gc ggatccacc atg ctcgag... 



Sail Stop Bglll EcoRI 
....gtcgac tga agatct gaattc gc3* 
Xhol Stop Bglll EcoRi 
.ctcgag tga agatgt gaattc gc 3* 
Stop Bglll EcoRI 
tga agatct gaattc gc3* 
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Cassette Construction 



Full Length 5687bp 



A1-A4 3330bp 



A5-A8 2500bp 



Subcassettes 

I A1/A2 1670bp 



Xbal.. Spel 



A3/A4 1670bp 



Nhel Spel 
Spel/Xbal Avrll/Nhel 



, A5/A6 1670bp , 


1 

Spel 


1 

Xbal 


Avrl 


/Nhel 



A7 840bp| 



Subcassette Extras (Can be removed 



SCI 


(A 28bp, B/C 34bp) 




As for 


5 • 


of Cassettes 


SC2 


{28bp) 




BaitiHI 


Xbal 




5 • 


gc 


ggatcc 


tctaga 


SC3 


•(28bp) 




BainHI 


Spel 




(28bp) 


gc 


ggatcc 


actagt 


SC4 




BamHI 


Nhel 




5 • 


gc 


ggatcc 


get age 


SC5 


(28bp) 




BainHI 


Spel 




5 » 


gc 


ggatcc 


actagt 


see 


(28bp) 




BamHI 


Nhel 




5 • 


gc 


ggatcc 


gctagc 


For 


CasBet:t:es A and B 


only 


SC7 


(37bp) 




BamHI 


Nhel 




5 • 


gc 


ggatcc 


gctagc , 


For- 


Cassette C 


only 




SC7 


{28bp) 




BainHI 


Nhel 




5 • 


gc 


ggatcc 


gctagc . 


SC8 


{31bp) 




BamHI 


Xbal 




5 • 


gc 


ggatcc 


tctaga. 



from c^i^ssette ends) 

Spel EcoRI 
actagt gaattc gc 3 

Nhel EcoRI 
gctagc gaattc gc 3 ' 

Avrl I EcoRI 
. . 1 cctagg gaattc gc 8 ' 

Xbal EcoRI 
tctaga gaattc gc 3 ' 

Avrll ' EcoRI 
ccatgg gaattc gc 3' 

Xbal EcoRI 
tctaga gaattc gc 3 • 



As for 3 ' of Cassettes A/B 



Spel EcoRI 
.actagt gaattc gc 3*. 



As for 3 ' of Cassette C 
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Kozac 



BamHI 



Start 



30 



40 



80 



, , . . env 185-214(149) 'J 

GGdGGATdcACc|ATGjACAGGCCCTTGCiUyiAAACGTCAGCWCCGTGCAATGCACACACG 
CcdcCTAaGTGaTAcjTGTCCGGGAACGTKTTTGCAGTCGWGGCACGTTACGTGTGTGCCTTAGTYT^ 

MTGPCXNVSXVQCTHGIXPVVST> 



50 



100 



110 



S 
120 



130 



gag 76-105 (6) 



160 



ccaactgctcctgaatggctccctdaraagcctctwcaataccrtcgccacactgtggtgcgtccaccaaaggattgasg 
ggttgacgaggacttaccgagggacItyttcggagawgttatggyagcggtgtgacaccacgcaggtggtttcctaactsc 

Q LtiliWGSL'x S LXNTXAT liWCVHQ.R X X> 



170 



180 



190 



200 



210 



220 



pol 31-60(36) 



TCARGGACACAAAGGAAGCCCTCGACAAAATCGA^^CTCGGCGAOXSGCGGAGGCGCTGAWAGGCAAGGCACCTCCAGCTCC 
AGTycCTGTGTTTCCTrCGGGAGCTGTTTTAGCTOjGAGCCGCTACCGCCTCCGCGACTWTCCGTTCCGTGGAGGTCGAGG 
VX DTKEALDK I e'l GDGGGAXRQGT S S S> 



250 



260 



270 



280 



290 



300 



310 



32 0 



YTCARCTlT'CCACAAATCACACTGTGGCAAAGGCCTCTGGTCACdGAACCCTTCAGAAWAMAGAATCCCGAWATGGTGAT 
RAGTYGAAAGGTGTTTAGTGTGACACCGTTTCCGGAGACCAGTGGCTTGGGAAGTCTTWTKTCTTAGGGCTWTACCACTA 
X XF P QITLWQ R P LVT'EPFRXXNi'XMVI> 



poJ 316-345 (55) 35o 



360 



370 



380 



390 



400 



ttaccagtacatggacgatctgtatgtgggaagcgatctggaaatcggacagcatItttaccacacccgataagaaacacc 
aatggtcatgtacctgctagacatacacccttcgctagacctttagcctgtcgtjIaaatggtgtgggctattctttgtgg 

Y QYMDDLYVG S D LEXGQh'fTTPDKK H> 



410 



pol 361-390 (58) ^^o 



450 



460 



470 



480 



aaaaggaaccaccattcctctggatgggatacgaactgcatcccgataggtggaccgtccagccoIyttartttccctcag 

TTTTCCTTGGTGGTAAGGAGACCTACCCTATGCTTOACGTAGGGCTATCCACCTGGCAGGTCGGTjRAATYi^^ 

Q KEPP PLWMGYELHPDRWTVQP'XXFPQ> 



490 



500 



pol 46-75 (37) 530 540 550 560 

ATTACCCTCTGGCAGCGTCCCCTCGTGACARTCAAAATCGGCGGACAGCTCAWAGAGGCTCTGCTCGACACAGGqTCCYA 

^GGRT 
S X> 



TAATGGGAGACCGTCGCAGGGGAGCACrGTYAGTTTTAGCCGCCTGTCGAGTWTCTCCGAGACGAGCTGTGTCCG A( 
I TI.WQRPI,VTXKIGGQLXEA1,I*DTG^, 



570 



580 



590 



tat 46-75(121) 



620 



630 



640 



TGGCAGAAAGAAACGTAGGCAACGTAGASGCGCTCCTCAGAGCAGMRAGGATCACCAATACCCTATCyCTGAGCAACCCC 
ACCGTCTTTCTTTGCATCCGTTGCATCTSCGCGAGGAGTCTCGTCKYTCCTAGTGGTTATGGGATAGRGACTCGTTGGGG 
G RKKRRQRRXAPQSXXDHQYP XXEQP> 



650 



660 



670 



680 



pel 1-30 (34) 



710 



720 



X 



TCYCCTTCTtTAGGGAAJ^ACCTGGCTTTCCMGCAAGGTRAAGCCAGAGAGTTTYCCAGCGAACAGACARGAGCCAATAGC 
AGRGqAAGAAATCCCTTTTGGACCGTiAAGGKCGTTCCAYTTCGGTCTCTCAAARGGTCGCTTGTCTGTYCTCGGTTATCG 
FFRENLAFXQGXAREFXSEQTXANS> 



730 



740 



750 



rev 106-1 22 (131) 



pacers aco 

* 



yCCRCCTCCAGGAAQAGCCCCCAAATCTCCGGCGAAAGCTCCGYCRTTCTGGGAYCTGGCACCAAAAAC GCCGC' 
RGGYGGAGGTCCTTqTCGGGGGTTTAGAGGCCGCTTTCGAGGCRGYAAGACCCTRGACCGTGGTTTTTqCGGCG^ 
X XSRK'SPQXSGESSXXLGXGTKN 



810 



820 



830 



gag 91-120 (7) 



860 



870 



A1 

join 

T R> A2 
B80 



aGAATCGAWGTGARAGATACCAAAGAGGCTCTGGATAAGATTGAGGAGGWGCAAAASAAAAGCMAGCAAAAGACKCAAC 
TCTTAGCTWCACTYTCTATGGTTrCTCCGAGACCTATTCTAACTCCTCCWCGTTTTSTTTTCGKTCGTTTTCTGTGTTG 
R IXVXDTKEALDKIEEXQXKSXQKTQ> 



890 



900 



910 



920 



pel 601-630 (74) 950 



950 



AGGCTGCCGCl ftAAGCCGGATACGTCACCGATAGGGGAAGGCAAAAGRTTRTeTCCCTGACAGASACAACCAATCAGAAA 

tccgacggcgaItttcggcctatgcagtggctatccccttccgttttcyaayagagggactgtctstgttggttagtcttt 
kagyvtdrgrqkxxsltxttnqk> 
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970 



980 



990 



lOOO 



1010 



env 46-75(140) 



1040 



0* E L X A I X 
1050 1050 



ACCGAACTGCAWGCCATTCAWCJAMGCCRMTACCACACTGTTTTGCGCCAGCGATGCCAAAGCCYATGASACAGAGGTCCA 
TGGCTTGACGTWCGGTAAGTI<jcTKCGGYKATGGTGTGACAAAACGCGGTCGCTACGGTTTCGGRTACTSTGTCTCCAGGT 

XAXTTLFCASDAKAXXTEVH> 



1070 



1080 



1090 



pol 76-105(39) 112 0 



CAATGTGTGGGCCACACACGCTTGCGTCCCqGCTGACGATACAGTGCTGGAGGASATSAACCTCCCCGGAARATGGAAGC 
GTTACACACCCGGTGTGTGCGAACGCAGGGGCGACTGCTATGTCACGACCTCCTSTASTrGGAGGGGCCTTYTACCTTCG 



N V W 



1130 



1140 



1150 



1160 



1170 



1180 



1190 



1200 



CTAAGAT6ATTGGCGGAATCGGCGGATTCATTAAGGTGAGi?ARGATCGGACCCGAAAACCCTTACAATACCCCARTCTTC 
GATTCTACTAACCGCCTTAGCCGCCTAAGTAATTCCACTCaTYCTAGCCTGGGCTTTTGGGAATGTTATGGGGTYAGAAG 
PKMIGGIGGFI KVR*XIGPENPyNTPXP> 

1230 1240 1250 1260 1270 1280 



pol 196-225(47) 



gctatcaagaaaaaggactccaccaaatggagaaagctcgtggatttcagjejrttaggattatcaawatcctctaccaaag 
cgatagttctttttcctgaggtggtttacctctttcgagcacctaaagtcJyaatcctaatagttwtaggagatggtttc 
aikkkdstkwhklvdfr'xriixil yq 



1290 rev 16-45 (125) 



1320 



1330 



1340 



1350 



s> 

1360 



caatccctatcctagctccgaaggcwccaggcaarccagaargaataggagaaggagatgg ggaggcgaacrggrtaggg 



gttagggataggatcgaggcttccgwggtccgttyggtcttvcttatcctcttcctctacc 
npypssegxrqxrxnrrrrw 



cctccgcttgyccyatccc 

G G E X X R> 



1370 



1380 env 525-554 (1 71 ) i^io 



1420 



1430 



1440 



ATAGGTCCGTGAGACTGGTCARCGGATTCrYAGCCCTCGCCTGGGACGATCTGAGAARCCTCTGCCTCTTdGAMAACCTC 
TATCCAGGCACTCTGACCAGTYGCCTAAGARTCGGGAGCGGACCCTCCTAGACTCTTYGGAGACGGAGAAdcTiaPTGGAG 
DRS VRI.VXGFXALAWDDLRXLCIiF'XNIi> 



1450 



1460 



1470 



env 31-60 (139) 



1500 



1510 



1520 



TGGGTCACCGTCTACTATGGCGTCCCCGTCTGGAGAGASGCTRMCACAACCCTCOTTCTGTGCCTCCGACGCTAAGGCTYA 
ACCCAGTGGCAGATGATACCGCAGGGGCAGACCTCTCTSCGAYKGTGTTGGGAGAAGACACGGAGGCTGCGATTCCGART 
WVTVYYGVPVWRXA XTTLFCAS DA KAX> 



spacers 



1550 



1560 



rev 1-30 (124) 



1590 



1600 



CGCTGCC ATGGCTGGCAGAAGCGGCRRCACAGACGAAGAGCTCCTGARGGCTRTCAGAATCATTAASATTCTGTATCAGT 
qCGACGqrACCGACCGTCTTCGCCGYYGTGTCTGCTTCTCGAGGACTYCCGAYAGTCTTAGTAATTSTAAGACATAGTCA 
MAGRSGXTDEELLXAXRIIX ILYQ> 



ISIO 



1620 



1S30 



1540 



1650 



vif 16-45 (101) 



L680 



CCAAC CCTTACCCTTCC ATGARAATCAGAACCTGG AAS AGCCTGGTCAAGCATCACATG YACATCTCCAAGAAA 

GGTTGGGAATGGGAAGG^^^^rACTYTTAGTCTTGGACCTTSTCGGACCAGTTCGTAGTGTACRTGTAGAGGTTCTTT 
S N P Y P S a" SUKX RTW XSLVKHHMX I SKK> 



A2 
join 
A3 



1690 



1700 



1710 



1720 



1730 



1740 



1750 



1760 



gccaawggctggttctataggcatcactwtgassagtccgagstcgtgartcagattatcgaavagctcatcaaaaagga 
cggttwccgaccaagatatccgtagtgawactsctcaggctcsagcactyagtctaatagcttbtcgagtagav2«rttcc^ 
axgwfyrhhxxesexvxqiiexlikke> 



pol 661-690 (78) 



1790 



1800 



1810 



1820 



1830 



1840 



AARGGTCTACCTAKCATGGGTACCAGCCCAGAAGGGAATCGG? CAAACCAAAGAGCTCCAGAAMCAGATTMYCAAAATCC 
TTYCCAGATGGATMGTACCCATGGTCGGGTGTTCCCTTAGCCIGTTTGGTTTCTCGAGGTCTTKGTCTAAKRGTTTTAGG 
XVYLXWVPAHKGIGQTKELQXQIXKI> 



185 0 pol 916-945 (95) 



1880 



1890 



1900 



1910 



1920 



AAAACTTTAGGGTCTACTATAGGGATAGCAGAGACCCTMTCTGGAAGGGACCdAAAAGCYTTGAGGAAATCTGGRACAAT 
TTTTGAAATCCCAGATGATATCCCTATCGTCTCTGGGAKAGACCTTCCCTGGctTTTTCGRAACTCCTTTAGACCYTGTrA 
QNFRVYYRDSR DPXWKGp'ksXEE IWXN> 
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1930 env 405-434 (163) iseo 1970 1980 1990 2000 

* ***** 

ATGACATGGATKSAGTGGGAGAGAGAGATTAGCAATTACACAARCCWAATCTATRAGATTCTQARACCCGAACCCACAGC 
TACTGTACCTAMSTCACCCTCTCTCTCTAATCGTTAATGTGTTYGGWTTAGATAYTCTAAGAqTYTGGGCTTGGGTGTCG 
MTWXXWERE I S NYTX2<iyXIL^XPEP TA> 

2010 2020 gag 451-480 {31} ^"^^ ^oeo 2070 2080 



CCCTCCCGCTGAGARTTTCRGATTCGGTGAGCSAAACTACACCCTCCCmAAGCAAGAGCMAAAGGATAAGGAdcAA'rACG 
GGGAGGGCGACTCTYAAAGYCTAAGCCACTCCTTTGATGTGGGAGGGKTTTCGTTCTCGKTTTCCTATTCCTCjGTTATGC 
PPAEXFXFG E ETTPSXKQEXKDKElQy> 

2090 2100 2110 pol 106-135 (41) 2140 2150 2160 



ATCAGATTWTTATTGAGATTTGCGGCAAGAAAGCTATTGGTACAGTGCTCGTGGGACCTACCCCTGTGAATATCATTGGC 
TAGTCTAAKAATAACTCTAAACGCCGTTCTTTCGATAACCATGTCACGAGCACCCTGGATGGGGACACTTATAGTAACCG 
DQIXI EICGKK AXGTVLVGPTPVNI XO 

2170 2180 2190 2200 VOf 46-75 fllS) 2230 2240 

agjatttacgmacctatggcgatacctgggagggcgtcgaggctctgatcagaaycctccagcaactgmtgtttrtcca 
tcJtaaatgctttggataccgctatggaccctcccgcagctccgagactagtcttrggaggtcgttgackacaaayaggt 

r'i Y E TY G D TW E G VE AL IRXL Q Q.LX F X H:> 
2250 2260 2270 2280 2290 tat 31-61 f1 20) 2320 

tttcagaatcggaItgttwtcattgccaastgtgttttctcaccaaaggtctcggcattagcyacggaaggaaaaagagaa 
aaagtcttagcctkcaawagtaacggttsacacaaaagagtggtttccagagccgtaatcgrtgccttcctttttctctt 
FRI g'cxhcqxcfltkglgisx GRKKR> 

2330 2340 spacefs 2370 2380 tat 1-30(118) 

* * . . ^ * * ^ ^ 

RACAGAGAAGGSGAGCTCCCCAJc GCTGCCjATGGACCCCGTGGACCCCAASCTGGAGCCTTGGAAWCACCCTGGCTCCCAG 
YTGTCTCri-TCCSCTCGAGGGGTaCGACGdxACCrGGGGCACCTGGGGTTSGACCTCGGAACCTTWGTGGGACCGAGGGTC 
X Q RRXA PQAaIm D PVD PXLE'PWXH P G S Q> 



2410 2420 2430 2440 2450 2460 2470 2480 



f 



cctamgacagcctgtwmcaaatgctattgcaaaaagtg< :^^^| gaagagacaacccctagccmgaaacaggaacmgaa A3 
ggatkctgtcggacawkgtttacgataacgtttttpcacc I^^^ cttctctgttggggatcggkctttgtccttgkctt I o i n 

PXTACXKCYC KKC P S BETTPS X KQ EX K> 



gag 466-495 (32) 



2510 2520 . 2530 2540 2550 2560 



AGACAAAGAACWCTACCCCCCTTYAGCCAGCCTCAAGTCCCTGTTTGGCAATGAC AATTTCAATATGTGGAAGAATRACA 

tctgtttcttgwgatggggggaartcggtcggagttcagggacaaaccgttactg ttaaagttatacaccttcttaytgt 

DKEXYPPXAS LKSLFGNDNFNMWKNX> 

2570 env 91-120 (143) ^eoo 2610 2620 2630 2640 
* * * *i* *• 

TGGa^GAMCAGATGCAMGAAGACRTTATCTCACTATGGGACCAAAGCCTCAAGCCrTGCGTCAAGCTCGACGTCGGCGAT 
ACCACCTKGTCTACGTKCTrCTGYAATAGAGTGATACCCTGGTTTCGGAGTTCGGAACGCAGTraGAGCTGCAGCCGCTA 
MVXQMXEDXI S LWDQSLKPCVKr»DVGD> 

2650 2660 pol 256-285 (51) 2690 2700 2710 2720 



gcctatttctccgtgcctctggatraarrcttcagaaagtataccgctttcacaatccctagcayaaacaatgag 



CAACT 



cggataaagaggcacggagacctayttyygaagtctttcatatggcgaaagtgttagggatcgtrtttgttactcgttga 



AYFSVP LDKX FRKYTAFTIPSXNWE 



Q L> 



2730 2740 2750 pol 751-780 (84) ^780 2790 2800 



gaaaggcgaagccatscatggccaagtgrattgctcaccaggcatttggcaactggattgcacacacctggagggaaagr 
ctttccgcttcggtasgtaccggttcacytaacgagtggtccgtaaaccgttgacctaacgtgtgtggacctccctttcy 
kgeaxhgqvxcspgiwqldcth]:.egk> 

'2810 2820 2830 2840 DOl 166-1 95 (45) 2870 2880 

TTATC cctaaggtcaagcaatggcctctgacagaggaaaagattaaggctctgactgmgatttgcamagagatggagvaa 
AATAC ggattccagttcgttaccggagactgtctccttttctaattccgagactgackctaaacgtktctctacctcbtt 

X l'pKVKQWPLTEEKIKAIiTXlCXBMEX> 
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28S0 

•k 

G AG GG AA AG ATTAGC 
CTCCCTTTCTAATCG 
E G K I S 



2900 



2910 



pol 331-360 (56) 



2940 



2S50 



2960 



ATGGATGACCTCTACGTCGGCTCCGACCTGGAGATTGGCCAACATAGGRCCAAAATCGAAGAGCT 
TACCTACTGGAGATGCAGCCGAGGCTGGACCTCTAACCGGTTGTATCCYGGTTTTAGCTTCTCGA 
MDD:i:iyVGSDLEIGQHRXKIEEL> 



2970 



2980 



2990 



3000 



3030 



3040 



* pol 616-645 (75) 

caggsmcacctcctgaratggggjjctcaccgamaccacaaaccaaaagactgagctccamgctatccawctggctctgc 
gtccsktgtggaggactytaccccJgago'ggctktggtgtttggttttctgactcgaggtkcgataggtwgaccgagacg 

RXHLLXW g'lTX TTWQKTELXAIXLA L> 



3050 



3060 



3070 



3080 



3090 



Q D S G X 
3130 



E V N 
3140 



I V T D 
3150 



pol 796-825 (87) 



3120 



AAGACTCCGGCTyAGAGGTCAACATTGTGACAGAC ATTCCCGCTGAGACTGGTCAAGAGACCGCCTATTTCMTTCTGAAA 

ttctgaggccgartctccagttgtaacactgtctgItaagggcgactctgaccagttctctggcggataaagkaagacttt 

1paetgqetayfxlk> 



3160 



3170 



3180 



3190 



3200 



CTGGCTGGCAGATGGCCTGTGARAJRYCATTCACACAGACAATGGCUGGACyVAAGATTGAGGAACTGAGASMGCAT 
GACCGACCGTCTACCGGACACTYTVPjGTAAGTGTGTCTGTTACCGJTCCTGTTTCTAACTCCTTGACT^ 
LAGRWPVXX'XHTDNG'RTK,IEELRXHLIi> 



pel 346-375 (57) 



3230 



3240 



3250 



3260 



3270 



3280 



3290 Vif 166-192(111) 3320 



3330 



spacers 



ATARGTGGAACRA AC CCCAGAAAAYCAAGGGACRC AGAGRAAATC ACAC AATGAATGGCCAa GCTGC C ACAGAGTCCCAG 
TATYCACCTTG yTTGGGGTCTTTTRGTTCCCTGYGTCTCYTTTAGTGTGTTACTTACCGGT^ CGACGG rCTCTCAGGGTC 
DXWNXPQKXKGXRXNHTMNGH | A A | T E S Q> 

3430 3440 



3360 



3370 



3420 



3380 env 436-464 (165) 34io 

AATCAGCAAGACAGAAACGAAMAGGAMCTGCTGGMGCTCGACAAATGGGCAAGCCTCTGGAATTGGTTTRACATOAScj^^ 
TTAGTCGTTCTGTCTTTGCTTKTCCTKGACGACCKCGAGCTGTTTACCCGTTCGGAGACCTTAACCAAAYTGTAATSi 
NQ QDRNEXXLI.XLD 



3450 



3460 



3470 



KWAS LWNWFX 
3500 3510 



SGCT 



gag 121-150 (9) 



X X • D> 

3520 



CACCGGAARTAGCTCCMAAGTGTCCCAGAATTACCCTATCGTCCAGAATSYCCAAGGCCAAATGGTCCACCAASCCMTCT 
GTGGCCTTYATCGAGGKTTCACAGGGTCTTAATGGGATAGCAGGTCTTASRGGTTCCGGTTTACCAGGTGGTTSGGKAGA 
TGXSSXVSQNYP1VQNXQGQMVHQXX> 



3530 



3 540 



3550 



3560 480-509 (168) ^590 



3600 



cccccagijctcrtcggactgagaatcrttttcgctgtgctcagcattrtcaatagggtcaggcaaggctatagccctctg 
gggggtcJgagyagcctgactcttagyaaaagcgacacgagtcgtaayagttatcccagtccgttccgatatcgggagac 
s pr'lxglr ixfavlsixwrvrqgyspi*> 

3610 



3620 



3630 



3640 



3650 



vlf 106-135 (107 



3680 



TCCTTCCAAACCCTCMYC CTCATCCATCTGYAWTACTTTGACTGTTTCKCTGACTCCRCCATTAGGAGAGCCATCCTGGG 



AGGAAGGTTTGGGAGKRC 
S F Q T I, X 



GAGTAGGTAGACRTWATGAAACTGACAAAGMGACTGAGGYGGTAATCCTCTCGGTAGGACCC 
LrHLXYFDCFXDSXIRRAILG> 



3690 



3700 



3710 



3720 



3730 



3740 



3750 



3760 



ACASAKAGTGAGMAGGAGATGCGAATAqGCTGTGGGAMTCGGAGCCATGWTCYTTGGCTTTCTGGGTGCCGCTGGCTCCA 
TGTSTMTCACTCKTCCTCTACGCTTATcjcGACACCCTKAGCCTCGGTACWAGRAACCGAAAGACCCACGGCGACCGAGGT 
XXVXRRC EY»AVGX6AMXXGFLGAAG S> 



caratcgggcttcacaacccctgacaaaaagcatcagaaagagcctccctttct^^^Sgtcaagaaactgacagagg 
gtytaccccgaagtgttggggactgtttttcgtagtctttctcggagggaaaga c^^^^^ cagttctttgactgtctcc j o 111 

XW6FTTPDKKHQKEPPFLSSV1CKI.'PE> 



A4 
join 
A5 



env 300-329 (156) 



3790 



3800 



3 810 



3820 



3830 



3840 



CCATGGGCGCTGCCTCCATSACACTGACAGTGCAAGCCTATGACCCTAGCAAAGACCTCRTTGCTGAGATTCAGAAACAG 
GGTACCCGCGACGGAGGTASTGTGACTGTCACGTTCGG ATACTGGGATCGTTTCTGGAGYAACGACTCTAAGTCTTTGTC 
TMGAASXTIjTVQ A*YDPSKDLXAEIQKQ> 
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pol 466-495 (65) 



3870 



3880 



3890 



3900 



3910 



3920 



GGTCAGGRTCAGTGGACATWTCAGATTTWCCAAGAGCCTTTCAAAAAC 



CCAGTCCYAGTCACCTGTAWAGTCTAAAWGGTTCTCGGAAAGTTTTTGCCTTGGCAGGACCAGCCGGGATGTGGGCAGTT 



X 



3930 



W 



X Q 



K N 



GGAACCGTCCTGGTCGGCCCTACACCCGTCAA 



poi 121-150 (42) 



3960 



3970 



V L V 
3980 



G P T P 
3990 



V N> 
4000 



CATCATCGGAAGGAACMTGCTGACACAGMTTGGCYGCACCCTCAACTTTCCCATTAGC 



GTAGTAGCCTTCCTTGKACGACTGTGTCKAACCGRCGTGGGAGTTGAAAGGGTAATCG TTTCCGTCGGGACGATAGAAAG 



I G R N X L 
4010 4020 



X 



pol 301-330 (54) 



N F P I 
4050 



AAAGGCAGCCCTGCTATCTTTC 



K 



4060 



S P A 
4070 



I F> 
4080 



AGTCCAGCATGMCAMAGATTCTGGAGCCTTTTAGGAWAMAAAACCCTSASATGGTCATCTATCAGTAag^^JIcCTCTG 
TCAGGTCGTACKGTKTCTAAGACCTCGGAAAATCCTWTKTTTTGGGACTSTACCAGTAGATAGTCAT^^^^^GGAGAC 
QSSMXXX LEPFRXXMPXMVIYQYP S P 



4090 



4100 



4110 nef 136-165 (188) 4i4o 



4150 



L> 
4160 



ACATTCGGATGGTGTTTCAAACTGGTCCCCGTGGACCCCAGSGAAGTGGAAGAGRyCAACRAGGGCGAAAACAATTGCCT 
TGTAAGCCTACCACAAAGTTTGACCAGGGGCACCTGGGGTCSCTTCACCTTCTCYRGTTGYTCCCGCTTTTGTTAACGGA 
TFGWCFKI^VPVDPXEVEEXNXGENNCL> 



4170 



4180 



4190 



4200 



pol 271-300 (52J 



4230 



4240 



CCTdTTTAGGAAATACACAGCCTTTACCATTCCCTCCAYCAATAACGAAACCCCTGGCATTAGGTATCAGTATAACGTCC 
GGAqAAATCCTTTATGTGTCGGAAATGGTAAGGGAGGTRGTTATTGCTTTGGGGACCGTAATCCATAGTCATATTGCAGG 
i FRKYTA F T X P SXNNETPG X R Y Q YWV> 



4250 



4260 



4270 



4280 



4290 env 315-344 (157) 4320 



TGCCTCAGGGATGdGGAAGCACAATGGGAGCCGCCAGCATKACCCTCACCGTCCAGGCTAGGCWACTGCTGAGCGGAATC 
ACGGAGTCCCTACcjcCTTCGTGTTACCCTCGGCGGTCGTAMTGGGAGTGGCAGGTCCGATCCGWTGACGAGTCGCCTTAG 
GS TMGAA SXTLTVQ.ARXI*L SG I> 



Q G W " 
4330 



4340 



4350 



4360 



4370 pol 451-480 (64) 440o 



GTCCAGCAACAGARCAATCTGCTdGMGGAGAATAGGGAAATCCTCARAGAGCCTGTGCATGGCGTCTACTACGATCCCTC 
CAGGTCGTTGTCTYGTTAGACGAcjcKCCTCTTATCCCTTTAGGAGTYTCTCGGACACGTACCGCAGATGATGCTAGGGAG 

XENREII.XBPVHGVYYDPS> 



Q Q Q X NT I* L ' 
4410 4420 



4430 



4440 



4450 vpU 61-81 (136) 4480 



CAAGGATCTGRTCGCTGAARTCCAAAAGCAAGGCIASAGAGGAACTGTCCRCCWTGGTGGATATGGGAAACTACGACCTCG 
GTTCCTAGACYAGCGACTTYAGGTTTTCGTTCCOTSTCTCCTTGACAGGYGGWACCACCTATACCCTTTGATGCTGGAGC 
K DLXAEX QKQ G*XEEL SXXV DMGNYDL> 



spacers 4510 



4520 



4530 



B N N 
4570 



vpr 61-90 (116) 



4560 



gagtggacaataacctc gccgct attagaaycctgcaacagctcmtgttcrttcactttaggattggctgccrgcactcc 
ctcacctgttattggaqcggcg; taatcttrggacgttgtcgagkacaagyaagtgaaatcctaaccgacggycgtgagg 

A A IRXIiQQIiXFXHFRIGCXHS> 



4580 



4590 



4600 



4610 gag 406-435 (28) 4640 



I G I 
4650 



X R Q R 
4660 



R X R 
4670 



aggattggcatcmyccgtcagagaagggscag^gctcccaggaaaaagggatgctggaagtgtggcaragagggacacca 
tcctaaccgtagkrggcagtctcttcccsgtcajcgagggtcctttttccctacgaccttcacaccgtytctccctgtggt 

A PRKKGCWKCGXEGHQ> 



4680 



GATGAAGGATTGCACTGAGAGACAGGCTAACTTTCTGGGAAAG 

CTACTTCCTAACGTGACTCTCTGTCCGATTGAAAGACCCTTTqCTWCGGTCTGACYAATAGTYTTGGATAACCCCTGACG 
MKD CTERQAWF 



vlf 61-90 (104) 



R Q A W 
4750 



L G K 
4760 



4690 4700 4710 4720 

*•-*** 

GAWGCCAGACTGRTTATCARAACCTATTGGGGACTGC 

.TAACCCCTGACG 
XARL'XIXTYWGL> 

4770 4780 4790 4800 



ATACCGGTGAGAGAGACTGGCASCTCGGCCAWGGCGTCAGCATTGAGTGGAGGAYAAGGGAAAGGGCTGAGGATAGCGGC 
TATGGCCACTCTCTCTGACCGTSGAGCCGGTWCCGCAGTCGTAACTCACCTCCTRTTCCCTTTCCCGACTCCTATCGCCG 

htgerdwxlgxgvsiewr'xr eraedsg> 



f 

A5 
join 
A6 
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VPU 46-75 (135) 4830 4840 4850 4860 4870 4880 

■ ^ ' ^ a. J. J. J, J, 



AACGAAAGCGAAGGCGACASAGAAGAGCTCAGCRCAWTGGTGGACATGGGCAATTACGATCTG^P^I 
TTGCTTTCGCTTCCGCTGTSTCTTCTCGAGTCGYGTWACCACCTGTACCCGTTAATGCTAGAcji^MlGGACGGGGGTC 



NE SEGDXEELSXXVDMGWYDLS S PAPR> 
4890 95^y 5-|Q^539 /-jyQJ 4920 4930 4940 4950 4960 



GGGACCCGATAGGCyGGRGRGAATCGAAGAGGAAGGCGGAGAGCRAGRCAGAGRCAGAAGCGTCAGGCTCGTGARTGGdA 
CCCTGGGCTATCCGRCCYCYCTTAGCTTCTCCTTCCGCCTCTCGYTCyGTCTCYGTCTTCGCAGTCCGAGCACTYACCOT 

GPD rxxxieeeggexxrxrsvri:.v xgJ 



4970 4980 jief 151 -1 80 (1 89) ^^"^^ ^^^^ 



GWGAGGTCGAGGAARYCAATRAGGGAGAGAATAACTGTCTGCTCCACCCTATSRGTCWACATGGCATGGAAGACGAAGAS 
CWCTCCAGCTCCTTyRGTTAyTCCCTCTCTTATTGACAGACGAGGTGGGATASYCAGWTGTACCGTACCTTCTGCTTCTS 
X EVEEXNXGEW.NCLL HPXXXHGMEDEX> 

5050 5060 5070 pQ| 99-1-990 f98) ^^2° 

agagaggtcIaatagcgatatcaaagtgg'Tccccagaaggaaagccaaaatcattagggattacggaaagcaaatggctgg 
tctctccaaottatcgctatagtttcaccaggggtcttcctttcggttttagtaatccctaatgcctttcgtttaccgacc 

REV^NS DX KVVP R RKA KIXRDYGKQMAO 



5X30 5X40 5X50 SX60 po| 16-45 (35) SX90 5200 

* * ♦ * * * 

CGMTGACTGTGTGGCCRGCTTCYCTTCCGAGCAAACARGGGCTAACTCCYCTRCAAGCAGAAAGCTGGGAGACGGAGGCG 
GCKACTGACACACCGGYCGAAGRGAAGGCTCGTTTGTYCCCGATTGAGGRGAYGTTCGTCTTTCGACCCTCTGCCTCCGC 
XDCVAXFXS EQTXAN SXXSRKLGDGG> 

5210 5220 5230 5240 5250 gag 390-420 (27) 

GAGCCGASAGACAGGGAACAAGCTCCAGaTGTTaxrAAOTGCGGCAAAGAGGGACACMTTGCCARAAACTGTAGGGCCCCT 
CTCGGCTSTCTGTCCCTTGTTCGAGGTCQACAAAGTTAACGCCGTTTC'rCCCTGTGKAACGGTyTTTGACATCCCGGGGA 
GAXRQGTSS S*CPNCG K EGHXAXNCRAP> 

5290 5300 5310 5320 5330 5340 5350 5360 

* *■ * * **** 

CGC AAGAAAGGTTGTTG GA AATGCGGAARGGAAGG CCATlCAAATGAAAGACTGTACCGAAAGG CAAGC C AATTTC CTCGG 
GCGTTCTTTCCAACAA.CCTTTACGCCTTYCCTTCCGGTAGTTTACTTTCTGACATGGCTTTCCGTTCGGTTAAAGGAGCC 
RKKGCWKCGXEG H*QMKDCTERQANFLG> 

gag 421 -450 (29) 5390 5400 5410 5420 5430 5440 



CAAAATCTGGCCCTCCMRCAAAGGCAGACCCGGAAACTTTCYCCAAAGi 
GTTTTAGACCGGGAGGKYGTTTCCGTCTGGGCCTTTGAAAGRGGTTTCC 
KIWPSXKGRPGNFXQ S 



AAMTGGCTCTGGTATATCAAAATCTTTATCA^ 
TTKACCGAGACCATATAGTTTTAGAAATAGT 
XWL>WYIKIFI> 



5450 en V 465-494 (167) ^^^^ ^asq 5500 5510 5520 

tgatcgtcggtggactgrttggcctcaggattrtctttgccgtcctgtccatcrttaacIggagccgygagccragacctc 
actagcagccacctgacyaaccggagtcctaayagaaacggcaggacaggtagyaattdcctcggcrctcggytctggag 

MIVGGLXGLR IX FAVL SIXN'GAXSXDri> 

5530 5540 nef 31-60 (1 81 ) ^^^^ ^^^^ spacers^ 



gataaacatggcgctmttacaagctccaataccsctgccaataacsctgactgtgyctggctgraggci gctgccatgac 



CTATTTGTACCGCGAKAATGTTCGAGGTTATGGSGACGGTTATTGSGACTGACACRGACCGACYTCCG^CGACGG 
DKHGAXTSSNTXANNXDCXWLX 



TACTG 



CCTGCCCCCAG fi^Q 

join 
A7 



5610 5620 5630 ypU 1-30 (132) 

ACCCCTGGAGATCATCGCTATCGTCGCCYTTATCGTCGCCCTCATCMTAGCCATTGTGGTCTGGACAATCGYCTWCATTG 
TGGG6ACCXCTAGTAGCGATAGCAGCGGRAATAGCAGCGGGAGTAGKATCGGTAACACCAGACCTGTTAGCRGAWGTAAC 

PLEIIAXVAXIVALIXAIVVWTlXXr> A 

5690 5700 57X0 5720 DOl 136-1 65 (43) ^"^50 57S0 ' 

* * * ^ ' ^ ^ ' ' -k * /\7 



ACTA*: gp^^g AATWTGCTCACCCAAMTdGGAYGCACACTGAATTTCCCTATCTCCCCCATTGASACAGTGCCTGTGAAA join 
TCATi jSfc^^l TTAKACGAGTGGGTTKAGCCTRCGTGTGACTTAAAGGGATAGAGGGGGTAACTSTGTCACGGACACTTT -* 
E Y "^{^.'^''g ' NXLTQXGXTLNFPISPIXTVPVK> 

FIGURE 15 (Cont) 
SUBSTITUTE SHEET (RULE 26) 



wo 01/90197 



PCT/AUOl/00622 



69/216 



5770 



s pacers ^ ssoo ssio env 255-284 (153) ^^^^ 

CTGAAACCCGGAATGGATGGC GCCGCC AyCTTTAGGCCTGGCGGAGGCRATATSARAGACAATTGGAGAAGCGAACTGTA • 
GACTTTGGGCCTTACCTACCC CGGCGG rRGAAATCCGGACCGCCTCCGyTATASTYTCTGTTAACCTCTTCGCTTGACAT 
LKPGMDGAAXFRP GGGXXXDNWRS ELY> 



5850 



5860 



5870 



5880 



5390 



5900 



5910 



5320 



TAAGTATAAGGTCGTGRAGATTRAGCCTCTGGGARTC ACATGGATTCCCGAATGGGAGTTCGTCAACACACCC CCACTG G 
ATTCATATTCCAGCACYTCTAAYTCGGAGACCCTYAGTGTACCTAAGGGCTTACCCTCAAGCAGTTGTGTGGGGGTGACC 
KYKVVXIXPLGXTWI PEWEFVNTP PL.> 



5950 



5960 



5970 



5980 



5990 



6000 



pol 556-585 (71) 

TCAAGCTATGGTATCAGCTGGAGAAAGASCCTATCG-YTGGCGYTGAdcCTCAGGATCTCAACAYGATGCTGAATAyTGTA 
AGTTCGATACCATAGTCGACCTCTTTCTSGGATAGCRACCGCRACTqGGAGTCCTAGAGTTGTRCTACGACTTATRACAT 
VKLWYQLEKXPI KGXe'pQDL.NXMLNXV> 



6010 gag 181-210 (13) 



6050 



6060 



6070 



6080 



ggaggccatcaggccgctatgcaaatgctgaaagasacaatcaatgaggaagccgcJgtcctgtttctggatggcattra 
cctccggtagtccggcgatacgtttacgactttctstgttagttactccttcggcgmcaggacaaagacctaccgtaayt 
gghqaamqmlkxtineeaa'vlfldgix> 



6090 



6100 pol 706-735 (81) 6i30 



6140 



6150 



6160 



CAAAGCTCAAGAGGAACATGAGARGTATCACTCCAACTGGAGGACAATGGCCARCGAMTTTAATCTdMTGAAGCATMTCG 
GTTTCGAGTTCTCCTTGTACTCTYCATAGTGAGGTTGACCTCCTGTTACCGGTYGC'TKAAATTAGAdKACTTCGTAKAGC 
KAQ EEHEXYHSNWRTMAXXFN I>'"" 



6170 



6180 



6190 



gag 31-60 (3) 



6220 



X K H X> 
6230 6240 



tctgggcctctagggagctggagagattcgctctgaatcccrgcctgctggagacakccgaaggctgtmagcaaatoJgct 



AGACCCGGAGATCCCTCGACCTCTCTAAGCGAGACTTAGGGYCGGACGACCTCTGTMGGCTTCCGACAKTCGTTTA/s^Ci 
VWASR ELERFAIiNPXIiIjETXEGCXQ ri- 



ICGA 
A> 



6250 



6260 



6270 



6280 



en V 215-244 (151) 



6310 



6320 



G^GGAAGAGATTATCATTAGGTCCGAGAATYTCACARACAATGYCAAAACCATTATCGTCCAMCTCAACRAAAGCGTCGW 
CTCCTTCTCTAATAGTAATCCAGGCTCTTARAGTGTYTGTTACRGTTTTGGTAATAGCAGGTKGAGTTGYTTTCGCAGCW 
EEEIIIRSENXTXNXKTirVXDNXSVX> 



6330 



6340 



6350 



6360 



6370 



6400 



gag 1-30 (1) 

* >r w - 

gattaacIatgggcgctagggctagtgtcctcagmggcggcragctggacgcctgggaaaagattaggctcaggcctggcg 
ctaattScacccgcgatcccgatcacaggagtckccgccgytcgacctgcggacccttttctaatccgagtccggaccgc 

IN*MGA RASVLXGGXL DAWEKIRLRPG> 

nef 91-120 (185) 64Bo 



NT * W 

6410 



6420 



6430 



6440 



6450 



gaaagaaaaagtatagJctcaaggagaagggaggcctggasggactgrtttactccmaaaagaggcaagasattctggat 
ctttctttttcatatccgagttcctcttccctccggacctscctgacyaaatgaggkttttctccgttctstaagaccta 

GKKKYR'LKEKGGLXGriXYSXKRQXILD> 



6490 



6500 



6510 



6520 



6530 



6540 



6550 



6560 



CTGTGGGTGTAT7MACACACAGGGATT( 
GACACCCACATAKTGTGTGTCCCTAA< 



TGGGGAACCWTGATCCTCGGCWTGGTGATKATCTGTAGCGCCAGCGA 
ACCCCTTGGWACTAGGAGCCGWACCACTAMTAGACATCGCGGTCGCT 



LWVYXTQGFTRWGTXriiGXVX 

6590 6600 6610 6620 



I C S A S X> 
6630 6640 



B1 
join 
B2 



env 16-45 (138) 

SAATCTGTGGGTGACAGTGTAa-TACGGAGTGCCTGTGTGGAGGAGACWGCTCCTGTCCGGCATTGTGCAACAGCAAARTA 
STTAGACACCCACTGTCACATAATGCCTCACGGACACACCTCC TCTGWCGAGGACAGGCCGTAACACGTTGTCGTTTYAT 
NLWVTVYYG VPVWRR XIii:iSGlVQQQX> 



6650 env 330-359 (158) 



66B0 



6690 



67Q0 



6710 



6720 



ACCTCCTGAGGGCTATCGAAGCCCAACAGCATCTGCTCCAGCTCACCGTCTGG 



TGGAGGACTCCCGATAGCTTCGGGTTGTCGTAGACGAGGTCGAGTGGCAGACC CAGTCCGTAAAGGGGTCCGGAACCGAG 



Q Q H 



V W 



GTCAGGC ATTTC C CC AGGCCTTGGCTC 



W L> 
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vpr 31-60 (114) 



6750 6760 6770 6780 6790 * 6800 



CACRRCCTGGGACAGYACATCTATGAGACATACGGAGACACATGGKMGGGAGTGGAAGCCCTCjAMAGCCCTCATCAMACC 
GTGyyGGACCCTGTCRTGTAGATACTCTGTATGCCTCTGTGTACCMKCCCTCACCTTCGGGAGTKTCGGGAGTAGTKTGG 
HXLG QiXIYETYGDTWXGVEAL'xAL IX P> 

6810 vSf 151-180 (110) ^^^^ ^^^^ 6880. 

CAAAAftGATTARGCCTCCCCTCCCATCCGTGAAAAAGCTCACCGAAGACARATGGAATRAGCCTCAAAAGAYAj^ 
GTrTTTCTAATyCGGAGGGGAGGGTAGGCACTTTTTCGAGTGGCTTCTGTYTACCTTAyTCGGAGTTTTCTRm 

KKIX PPLPS VKKLTEDXWNX PQKX'Y S> 

6890 69Q0 po| 901 "930 (94) 

CTGGCGAAAGGATTRTCGATATCATTGCAWCCGACATTCAGACTAAGGAACTGCAAAASCAAATCMYAAAGATTCAGAA'X* 
GACCGCTTTCCTAAyAGCTATAGTAACGTWGGCTGTAAGTCTGATTCCTTGACGTTTTSGTTTAGKRTTTCTAAGTCTTA 
AGERI XDIXAXDIQTKELQXQIXK I QN> 

6970 6980 6990 po| 886-915 (93) ''^^^O 7030 7040 

ttcIgctgtgtttatccataactttaagaggaagggaggcattggcxsgctactccgccggagagagaatcrttgacattat 
aaqcgacacaaataggtattgaaattctccttccctccgtaaccgccgatgaggcggcctctctcttagyaactgtaata 
f'avfihnfkrkggiggysagerixdi i> 

7050 7060 7070 7080 . gag 256-285 {1 8) '^^^^ "^^2° 

CGCCASCGATATqRTTCCCGTGGGCGAWATCTATAAGAGATGGATCATTCTGGGACTCAACAAAATCGTGAGAATGTATY 
GCGGTSGCTATAqYAAGGGCACCCGCTWTAGATATTCTCTACCTAGTAAGACCCTGAGTTGTTTTAGCACTCTTACATAR 
AXDI*X PVGX IYKRWIILGI.NK1VRMY> 

7130 7140 7150. 7160 7170 eilV 495-524 (1 69) 7200 

fclACCCGTCAGCATTCTGGATATckGAGTGAGACAGGGATACTCCCCCCTCAGCTTTCAGACACTGMYGCCCGCTCCCAGA 
KTGGGCAGTCGTAAGACCTATAOrCTCACTerGTCCCTATGAGGGGGGAGTCGAAAGTCTGTGACKRCGGGCGAGGGTCT 
XPVSI IjDI'RVRQGYSPLSFQTLXPA PR> 

7210 7220 7230 7240 7250 7260 7270 7280 

* * * .* * * * * 

ggccctgacagacycgrasgcattgaggaagaotccagscaggaccatcagtatcccattyccgaacagcctctgy 
ccgggactgtctgrgcytscgtaacrccttctcpggtcsgtcctggtagtcatagggtaarggcttgtcggagacrgagt 
g pdrxxx ieee»sxqdh q ypixbqpl x q> 



tat 61-90 (122) 



7310 7320 7330 7340 7350 7360 



^^^ririARTnnTrfiARTrr'ATGAATAAGGAACTGA ^2 



GMCAAGGGGAGRCAATCCCACAGRCCCTRAGGAAAGCAAAAAGg^pl GGAGTGGTCGAGTCCATGAATAAGGAACTGA 
CKGTTCCCCTCYGTTAGGGTGTCYGGGA YTCCTTTCGTTTTTC^^^^CCTCACCAGCTCAGGTACTTATTCCTTGACT j Ol fl 
X RGXN PTXP XES K KA S GVVE S MNKEL> Q3 



7370 



pol 856-885 (91) "^^oo 74io 7420 7430 7440 ^ 



AAAAGATTATCGGACAGGTCAGGGAMCAGGCTGAGCACCTGAAAACCGCTGTGOUIATGSCTGCCATGCAGATGCTCAAG 
rprpTTCTAATAGCCTGTCCAGTCCCTKGTCCGACTCGTGGACTTTTGGCGACACGTTTACI^GACGGTACGTCTACGAGTTC 
KKI IGQVRXQAEHLKTAVQMAAMQMIiK> 

7450 • 7460 gag 1 96-225 (1 4) ^490 7500 7510 7520 

GAWACCATTAACGAAGAGGCTGCCGAGTGGGACAGARTCCATCCCGTCCATGCCGGACCCRTTSCCCCT rTCACCGMGAT 
CTWTGGTAATTGCTTCTCCGACGGCTCACCCTGTCTYAGGTAGGGCAGGTACGGCCTGGGYAASGGGGASAGTGGCKCTA 
XTINE EAAEWDRXHPVH AGPXXP*LT X I> 

7530 7540 7550 DOl 1 81-21 0 f 46) 7580 7590 7600 

* * ^ * * 

TTGTAMAGAAATGGAAVAAGAAGGCAAAATCTCCA-RGATTGGCCCTGAGAATCCCTATAACACACCCRTCTTTGCCATaC 

AACATKTCTTTACCTTBTTCTTCCGTTTTAGAGGTYCTAACCGGGACTCTTAGGGATATTGTGTGGGYAGAAACGGTA? G 

C XEMEXEGK ISXI GPENPYNTPXFAI>' 

7610 7620 7630 7640 DOl 871-900 (92) '^^'^^ '^^^^ 

AAGTGAGAGASCAAGCCGAACACCTCAAGACAGCCGTCCAGATGGCAGTCTTCATTCACAATTTCAAAAGGARAGGCGGA 
TTCACTCTCTSGTTCGGCTTGTGGAGTTCTGTCGGCAGGTCTACCGTCAGAAGTAAGTGTTAAAGTTTTCCTYTCCGCCT 
QV RXQAEHLK TAVQMAV FIHNFKRXGG> 
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7690 



7700 



7710 



7740 



7750 



7760 



pel 211-240(48) 

ATCGGAGGC AAAAAGAAAGATAGCACAAAGTGGAGGAAACTGiSTAGACTTTAGGGAGCTCAACAAACGTACACAGGATTT 
TAGCCTCCC TTTTTCTTTCTATCGTGTTTCACCTCCTTTGACCATCTGAAATCCCTCGAGTTGTTTGCATGTGTCCTAAA 
I GG^KKKDSTKWRKLVDFRELNKRTQ DF> 



7770 



7780 



7790 



7800 



env 540-569 (172) 



7830 



7840 



CTGGGAGGTCCAGCTCGGC 
GACCCTCCAGGTCGAGCCG 
W E V Q L G 



TTTTYGGCTCTGGCTTGGGATGACCTCAGGAGCCTGTGTCTGTTCAGCTATCACAGACTGA 
AAAARCCGAGACCGAACCCTACTGGAGTCCTCGGACACAGACAAGTCGATAGTGTCTGACT 
FXAIiAWDD IiRS LCLFSYHRL> 



7850 



7860 



7870 



7880 



7890 vpr 76-96 (117) 



7920 



GAGACYTTATCCTCATCGY'IGCCAGAAycjTGCCRACATAGCAGAATCGGCATCACTAGGCAACGTAGAGSTA'GGi^ 

CTCTGRAATAGGAGTAGCRACGGTCTTRqACGGyTGTATCGTCTTAGCCGTAGTGATCCGTTGCATCTCSATCCTTGCCG 

RDXILIXARX»CXHSRXGITRQRRXRNG> 



spacers 



7950 



7960 



7970 env 155-184 (147) 



KCCTCCAGGTCC GCTGCcjcCCAAARTCWCCTTCGAMCCCATTCCCATTCACTATTGCGCTCCCGCTGGCTWCGCTATCCT 
MGGAGGTCCAGC CGACGqGGGTTTYAGWGGAAGCTKGGGrAAGGGTAAGTGATAACGCGAGGGCGACCGAWGCGATAGGA 
X srsIaaIpkxxfxpi PIHY CAPAGXA I L> 



8010 



8020 



8030 



8040 



8050 



8080 



vif 76-105(105) 

CAAGTGTi\ACRATAAGAMMTTCAATGGcl3AAARGGATT6GCAWCTGGGACASGGAGTGTCCATCGAATGGA^ 
GTTCACATTGYTATTCTKKAAGTTACCGklOTTYCCTAACCG 

kcwxkxfng'exdwxlgxgvsi EWRXK> 



8090 



8100 



8110 



8120 



8130 



gag 481-499 (33) ^leo 

I ctctatcctccctyagcttccctgaaaagcctcttc 
I gagataggagggartcgaagggagttttcggagaag 



gstatagcacacaggtggaccctgrcctcgccgatcac| 
csatatcgtgtgtccacctgggacyggagcggctagt<^ 

XYSTQVDPXLADQPSLYPPXASI.KSLF> 

8170 spacers 8200 8210 vff 121-150 (108) ^^^^ 



GGAAACGATCCCTYATCCCA2 GCCGC1 AGAAGGGCTATCCTCGGCCAWAKAGTCAGSAGAAGGTGTGAGTATCMGKCCGG 
CCTTTGCTAGGGARTAGGGTQ CGGCGA rCTTCCCGATAGGAGCCGGTWTMTC AGTCSTCTTCCACACTCATAGKCMGGC C 

gndpxsqaarraii,gxxvxrrceyxxg> 



8250 



8260 



8270 



8280 



8290 



8300 



8310 



8320 



acacaataaggtcggctccctgcaatacctcgcactaagccaacccamaaccgcttgcwma^ 
tgtgttattccagccgagggacgttatggagcgtgagtcggttgggtkttggcgaacgwkgttcacaatgacattcttta 

HWKVGSLQYLAI.'SQPXTACXKCYCKK> 



8400 



tat 16-45 (119) 3^^° ^^^l pol 976-995 (99) 

GTTGCTWCCACTGTCAGSTCTGCTTCCTGAMGAAGGGACTGGGAATCAGGGATTACGGAAAGCAAATGGCTGGCGMTGAC 
CAACGAWGGTGACAGTCSAGACGAAGGACTKCTT,CCCTGACCCTTACTCCCTAATGCCTTTCGTTTACCGACCGCKACTG 

ccxhcqxcflxkglgirdygkqmagxd> 



8410 



spacers 



8440 



CVAXRQ DED 

8490 8500 8510 



8^50 pol 721-750 (82) 



8480 



tgtgtggccrgcaggcaagacgaagacgcagccaagtaccatagcaattggagaaccatggccartgastttaacctccc 

qC GTCGC ttcatggtatcgttaacctcttggtaccggtyactsaaattggaggg 

AAKY.HSNWRTMAXXFNL P> 



8520 



8530 



8540 



8550 



8560 



ccctatcgtcsctaaggaaatcgtcgcawrttgcgataagtg^aacgaatggrcactggaactgctggaggaactgaaam 
gggatagcagsgattcctttagcagcgtwyaacgctattcac^ttgcttaccygtgaccttgacgacctccttgactttk 
PIVXKE iv'axcdkcwewxlelleelk> 



vpr 16-45 (113) 



8590 



8600 



8610 



8620 



8630 



8640 



awgaagccgtgagacactttcccagaccctggctgcatggcctcggtcaacacgatrtcattagcctctgggatcagtcc 
twcttcggcactctgtgaaagggtctgggaccgacgtaccggagccagttgtgctayagtaatcggagaccctagtcagg 
xeavrhfprpwlhglgqhdxislwdqs> 



B3 
join 
B4 
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env 106-144 (1441 ^^^^ ^"^^^ ^720 

CTGAAACCCTGTGTGAAACTGACACCCCTGTGCGTCACCCTCAACTGTACCAATGCCAATCTGMWGAAGAGMTACTCCAC 
GACTTTGGGACACACTTTGACTGTGGGGAGACGCAGTGGGAGTTGACATGGTTACGGTTAGACKWCTTCTCKATGAGGTG 
h K PCVKLrPLCV-Tr.NCTW-ANL'xKXYS T> 

8730 8740 vif 91 -1 20 H 06) ^"^^^ ^"7^° 3^00 

CCAAGTGGACCCCGRTCTGGCTGACCAWCTGATTCACCTCCACTATTTCGATTGCTTTKCCGATAGCRCAATcjcATCCCA 
GGITCACCTGGGGCYAGACCGACTGGTWGACTAAGTGGAGGTGATAAAGCTAACGAAAMGGCTATCGYGTTAGGTAGGGT 
QVDPXLADXLIHLHYFDCFXDSXIJHP> 

8810 8820 8830 lief 1 66-1 95 (1 90) 8860 8870 8880 

*** ^'a** 

TSRGCCWACACGGAATGGAGGATGAGGAWAGGGAAGTGCTGAWATGGAAATTCGATAGCCRTCTGGCTCKCAGGCATATS 
ASYCGGWTGTGCCTTACCTCCTACTCCTWTCCCTTCACGACTWTACCTTTAAGCTATCGGYAGACCGAGMGTCCGTATAS 
XXXHGMEDEXREVLXWKFDSXLAXRHX> 



8890 8300 8910 8920 po| 161-180 (44) 8950 8960 



GCT 
CGA 



RVLAEAMS QXXXA 



IVXKEIVA> 



9290 pol 736-765 (83) 9320 9330 9340 9350 9360 

RCTGTGACAAATGCCAGCTCAAGGGTGAGGCTATKCACGGACAGGTGRACTGTAGCCCa TCCGAGGGAWCAAGACAGRCT 
YGACACTGTTTACGGTCGAGTTCCCACTCCGAO'AMGTGCCTGTCCACYTGACATCGGG/ AGGCTCCCTWGTTCTGTCYGA 



DKC QLKGEAXHGQVXC 



E G X R Q X> 



9370 9380 rev 31-60 (1 26) s^io 9420 9430 9440 

AGGARGAACAGACGTAGAAGGTGGCGTGMGAGGCAAAGGCAAATCCRCKCCATCTCCGAGWGGATrCTGGGACAGATRAG 
TCCTYCTTGTCTGCATCTTCCACCGCACKCTCCGTTTCCGTTTAGGYGMGGTAGAGGCTCWCCTAAGAqCCTGTCTAYTC 
RXNRRRRWRXRQRQIXXISEXIL.»GQXR> 

9450 9460 9470 gag 226-255 (1 6) '^^oq sslo 9520 

ggaacccagaggctccgacattgccggtaccacaagcacactgcaagagcaaatcgsatggatgacaarcaatcccccJr 

CCTTGGGTCTCCGAGGCTGTAACGGCCATGGTGTTCGTGTGACGTTCTCGTTTAGCSTACCTACTGTTYGTTAGGGGG^Y 
EPRGS DIAGTTSTLQEQIXWMTXNPpJ 

9530 9540 955.0 9560 po I 841-870 (90) ^^^^ 

RCATTMAGCAAGAGTTTGGCATTCCCTATAACCCTCAGTCCCAGGGCGTCGTGGAAAGCATGAACAAAGAGCTCAAGAAA 
YGTAAKTCGTTCTCAAACCGTAAGGGATATTGGGAGTCAGGGTCCCGCAGCACCTTTCGTACTTGTTTCTCGAGTTCTTT 
XIXQE FGIPYNPQSQGVVESMNKELKK> 



CCTATCGAWACCGTCCCCGTCAAGCTCAAGCCTGGCATGGACGGACCCAAAGTGAAACAGTGGCCCCTCAC 
GGATAGCTWTGGCAGG6GCAGTTCGAGTTCGGACCGTACCTGCCTGGGTTTCACTTTGTCACCGGGGAGTG jOIfl 
AS SPIXTVPVKr.kP6MD6PKVKQWPI,T> 

8970 8980 8990 900.0 9010 gag 436-465 (30) S040 1 

CGAAGAGAAAATCAAAGcdATTTGGCCTAGCMRCAAGGGAAGGCCTGGC.AATrTCCYGCAGTCCARGCCTGAGCCTACCG 
GCTTCTCTTTTAGTTTCGaTAAACCGGATCGKYGXTCCCTTCCGGACCGTTAAAGGRCGTCAGGTYCGGACTCGGATGGC 
EEKI KA^IWPS XKGRPGNFXQ SXPE PT> 

9050 9060 9070 9080 9090 Vlf 31 -60 f 1 02^ 9120 

** *** \ ' * 

CACCCCCAGCCGAGARCTTTRGATTCGGgATTAGCAAAAAGGCTAASGGATGGTTTTACAGACACCA 

GTGGGGGTCGGCTCTYGAAAYCTAAGCCGTAATCGTTTTTCCGATTSCCTACCAAAATGTCTGTGGTAAWGCTWTCGGYT 
APPAEXFXFG'I SKKAXGWFYRHHXXS X> 

9130 9140 9150 9160 9170 9180 9190 9200 

* *.* .* * * * * 

CACCCTAAGGTCAGCTCCGAGGTCCACATTCCCCTCGGcIatGATGACC GCTTGCCAAGGC GTC GGC GGAC C C RGTCACAA 
GTGGGATTCCAGTCGAGGCTCCAGGTGTAAGGGGAGCCaTACTACTGGCGAACGGTTCCGCAGCCGCCTGGGYCAGTGTT 
HPKVSSEVH XPLg'mMTACQGVGGPXH K> 

gag 346-375 (24) 9230 9240 9250 9260 9270 9280 

****** 

agccagggtactggcagaggctatgtcccaggygamcmacgctaacatacctcccattgtgsccaaagagattgtggcaw 
tcggtcccatgaccgtctccgatacagggtccrctkgktgcgattgtajggagggtaacacsggtttctctaacaccgtw 
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9630 nef 106-135 (186) seeo 



9670 



9680 



ATCATTGGCAGACAGGAGATCCTCGATCTCTGGGTCTACMATACCCAAGGCTWTTTCCCTGACTGGCASAATTACACACC 
TAGTAACCG TCTGTCCTCTAGGAGCTAGAGACCCAGATGKTATGGGTTCCGAWAAAGGGACTGACCGTSTTAATGTGTGG 
X X g'rQ EI L DL WVYXT QGXFPDWXNYT P> 



9690 



9700 



9710 



9720 



rev 46-75 (127) 



9750 



9760 



CGGACCCGGARyCA.GATACp^^B|AGAGMAAGACAGAGACAGATTCRTKCTATTAGCGAAWGGATTCTCAGCAMCTKCC 
GCCTGGGCCTYRGTCTATq^^^STCTCKTTCTGTCTCTGTCTAAGYAMGATAATCGCTTWCCTAAGAGTCGTKGAMGG 
GPGXRYP SRXRQRQIXXISEXILSXX> 



9770 



9780 



9790 



9800 



983.0 gag 301-330 (21) 984o 



TCGGCAGAYCCGCTGAGCCTGTGCCTCTCCAACTOTWTAAGACACTGAGAGCCGAACAGGC'pyCCCAAGASG 

AGCCGTCTRGGCGACTCGGACACGGAGACGTTGAOAWATTCTGTGACTCTCGGCTTGTCCGAWGGGTTCTSCAGTTCTTA 

LGRXAEPVPLQL*XK.TLRAEQAXQXVKN> 



9850 



9860 



9870 



9830 



9890 



9900 



9910 



9920 



TGGATGACCGASACACTGCTCGTGCAAAACGCTAACCCTGACTG'lpAGARAGTGTATCTGKCTTGGGTCCCCGCTCATAA 
ACCTACTGGCTSTGTGACGAGCACGTTTTGCGATTGGGACTGACiJcTCTYrCACATAGACMGAACCCAGGGGCGAGTATT 
WM T XT DLVQNAN PDC'EXVYLXWVPAH K>- 



pol 676-705 (79) 



9950 



9960 



9970 



9980 



9990 



10000 



AGGCATTGGCGGAAACGAACAGGTGGACAAACTGGTCAKCKCTGGCATTAGGAAA ACAGACCCTAACCCTCAGGAARTCS 
TCCGTAACCGCCTTTGCTTGTCCACCTGTTTGACCAGTMGMGACCGTAATCCTTT TGTCTGGGATTGGGAGTCCTTYAGS 
GX GGNEQVDK LVXXGIKKTDPNPQEX> 



10010 



env 76-105 (142) 



10040 



10050 



10060 



10070 



10080 



WTCTGGAAAACGTCACCGAGAACTTTAACATGTGGAAAAACRATATGGTGGASCAAATGCAWGAicTGGCTWTGCCA 
WAGACCTTTTGCAGTGGCTCTTGAAATTGTACACCTTTTTGYTATACCACCTSGTTTACGTWCTC^ 

XLENVT EW FNM WKWXMVXQMXE'AG XA I> 



10090 10100 env 170-199 (148) ioi3o 



10140 



10150 



10160 



CTGAAATGCAATRACAAAAMSTTCAACGGAACTGGACCCTGTAMGAATGTGTCCASCGTCCAGTGTACCCATGGgCrWAGA 
GACTTTACGTTAYTGTTTTKSAAGTTGCCTTGACCTGGGACATKCTTACACAGGTSGCAGGTCACATGGGTACcdGWTCT 
LKCNXKXFNGTG PCXNVSXVQCTHg'xE> 



10170 



10180 



10190 



env 600-629 (1 76) i°22o 



10230 



10240 



gctcaagawtagcgctrtctccctgctcaacgctaccgctatcgctgtggctgrgkggaccgataggrttatcgaagtgg 
cgagttctwatcgcgayagagggacgagttgcgatggcgatagcgacaccgacycmcctggctatccyaatagcttcacc 
i.kxsaxsllwataiavaxxtdrxiev> 



10250 



10250 



10270 



10280 



10310 



10320 



vif 46-75 (103) 

ytcacItcccrgcatcccaaagtgtccagcgaagtgcatatccctctgggagasgctaggctcrtcattargacatactgg 
ragtqagggycgtagggtttcacaggtcgcttcacgtatagggagaccctctscgatccgagyagtaao'yctgtatgacc 
xq1sxhpkvssevhiplgxari:.xixtyw> 



10330 



spacers 



CCGGAGGTSTGTCCC CGACG^ 
G L X T G A A 

10410 



10360 



10370 



GGCCTCCASACAGGC GCTGCQ ATGGGCGGTAAATGGTCCAAGWGCTCCCYCGTCGGATGGCCCGMAGTGAGAGAGAGAAT 



nef 1-30 (179) 



10400 



tacccgccatttaccaggttcwcgagggrgcagcctaccgggcktcactctctctctta 
mggkwskxsxvgwpxvreri> 



10420 10430 10440 10450 * pol 496-525 (67) 10480 

***** * 

CAGACRGRCASCCCCTGCCGCTGAGGGAGTG CTCAAGACCGGCAAGI»ACKCTAGGAWGAGGRGTGCCCATACCAATGACG 

gtctgycygtsggggacggcgactccctcacgagttctggccgttcatgmgatcctwctccycacgggtatggttactgc 
rxxxpaaegvlktgkyxrxrxahtnd> 



10490 



10500 



10510 



10520 



10530 



10540 



10550 



tcargcaactgacagmggytgtgcaaaagattgccacagacg 
agtycgttga.ctgtckccracacgttttctaacggtgtctqf" 
v-xqltxxvqkiat; 



10560 B6 

II TGGGAGGSTCTGAAATACTKGKGGAATCTGCTC 

I ACCCTC C S AGACTTTATGAMCMCCTTAGACGAG B 7 

SWEXI»KYXXNLL> | 
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10590 



10600 



10610 



10620 



10630 



10640 



env 585-614 (175) 

CWGTACTGGGGCCWGGAACTGAAAAWCTCCGCCRTCAGCCrCCTGAATGCCACAGCC ATTSWGCTGCCTGAGAAAGAWAG 
GWCATGACCCCGGWCCTTGACTTTTWGAGGCGGYAGTCGGAGGACTTACGGTGTCGGTAASWCGACGGACTCTTTCTWTC 
XY WGX EL KX S A XS LLiNATAI X L PEK X 



10650 



10680 



10630 



1Q700 



10710 



S> 
10720 



pci 391»420 (60) 

CTGGACCGTCAACGATATCCAAAAGCTCGTGGGAAAGCTCAACTGGGCATCCCAGATTTACSCC6GA[AGAGCCAa'TGAGG 
GACCTGGCAGTTGCTATAGGTTTTCGAGCACCCTTTCGAGTTGACCCGTAGGGTCTAAATGSGGCCTjrCTCGGTAACTCC 
WTVNDI QKL VGKLNWASQIYXG'RAI E> 



10730 10740 345-374 (159) 



10770 



10780 



10790 



10800 



CTCAGCAACACWTGCTGCAACTGACAGTGTGGGGCATTAAGCAACTGCAAGCCAGAGTGCTCGCCRTTGAGAGATACCTC 

gagtcgttgo'gwacgacgttgactgtcacaccccgtaattcgttgacgttcggtctcacgagcggyaactctctatg gag 

A*QQHXLQliTVWGIKQLQARVI.AXERYI.> 



10810 



10820 



1083 0 pol 631-660 (76) losso 



10870 



losao 



gccctccaggatagcggatyggaagtgaatatcgtcaccgatagccaatacgctctaggcatcattcwggctcagcctga 
cgggaggtcctatcgcctarccttcacttatagcagtggctatcggttatgcgagatccgtagtaagwccgagtcggact 
dsgxevnivtdsqyalgi ixaqpd> 



A li Q 

10890 



logoo 



10910 



10920 env 420-449 (164) losso 



10960 



caraagcIgaaagggaaatctccaactataccartcwgatttacragatcctcaccgaatctcaaaatcaacaggatagga 
gtyttcqctttccctttagaggttgatatggtyagwctaaatgytctaggagtggcttagagttttagttgtcctatcct 

=:*EREISWY TXXIYXID TESQNQQD R> 



X s ' 

10970 



10980 



10990 



11000 



11010 env 285-314 (155) ^^^^^ 



atgagmaagasctcctc gctcccacaarggctaagagaagggtcgtgsaaagggaaaagcgtgccgtcggcmttggcgct 

TACTCKTTCTSGAGGAC CGAGGGTGTTYCCGATTCTC'PTCCCAGCACSTTTCCCTTTTCGCACGGCAGCCGKAACCGCGA 
NEXXI*I.^APTXAKRRVVXREKRAVGXGA> 

11050 11060 11070 11080 



11090 pol 91-120 (40) 



11120 



M X X G 
11130 



atgwttytcggattcctcggcgctgcc aaacccat^atgatcggaggcattggaggctttatcaaagtcaggcagtatga 

TACWAARAGCCTAAGGAGCCGCGACGG TTTGGGTTTTACTAGCCTCCGTAACCTCCGAAATAGTTTCAGTCCGTCATACT 

fi.gaa'kpkmig giggfikvrqyd> 



11140 



11150 



11160 



11170 



11180 



11190 



11200 



X 



CCAAATCMTTATCGAAATCTGTGGAMASAAGGCTATCTCCTACCATAGGCTCAGGGATTTCATTCTGA'TCGYCGCTAGGA 
GGTTTAGKAATAGCTTTAGACACCTKTSTTCCGATAG AGGATGGTATCCGAGTCCCTAAAGTAAGACTAGCRGCGATCCT 
EICGXK AISYHRLRDFI LIXAR> 



env 555-584 (173) i^^^o 



11240 



11250 



11260 



11270 



11280 



YTGTGGAACTGCTCGGCCRTAGCTCCCTGARAGGCCTCCRGAGAGGCACACTGAATGCCTGGGTGAT^GTGRTTGAGGAA 
RACACCTTGACGAGCCGGYATCGAGGGACTYTCCGGAGGYCTCTCCGTGTGACTTACGGACCCACTTTCACYAACTCCTT 
XVEIiUGXSSLXGLXRGTLNAWVKVXEE> 



11290 



gag 151-180 (11) 



11320 



11330 



11340 



113 50 



11360 



AAGGSATTCARTCCCGAAGTGATTCCCATGTTTWCCGCTCTGTCCGAGGGAGCCACJ 
TTCCSTAAGTYAGGGCTTCACTAAGGGTACAAAWGGCGAGACAGGCTCCCTCGGTG' ' 
KXFXPEVIPM 



AGCAACACASCCGCTAA 
TCGTTGTGTSGGCGATT 



J 



11370 



11380 



FXAI*SEGATliESNTXAN> 
11410 11420 11430 11440 



nef 46-75 (182) 

CAATSCCGATTGCGYGTGGCTGRAAGCCCAGGAAGAGGAAGRAGTGGGATTTCCTGTGAGACCCCAAGTGCC1AGAGCCK 
GTTASGGCTAACGCRCACCGACYTTCGGGTCCa?TCTCCTTCYTCACCCTAAAGGACACTCTGGGGTTCACGGATCTCGGM 
NXDCXWLXAQEEEXVGFPVRP QVPRA> 



11450 env 630-651 (178) 



11480 



11490 



spacers 



ggagggctatcctcmacattcccasgaggattaggcaaggcyttgagagagccctcct; 



CCT< 



'CCCGATAGGAGKTGTAAGGGTSCTCCTAATCCGTTCCGRAACTCTCTCGGGAGGATCGGCGC cttaccctatccyaa 



1152 0 



GCCGCC GAATGGGATAGGRTT 



E W D R X> 
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gag 211-240(15) i^^vo iisao 11590 



CACCCTGTGCACGCTGGCCCTRTCSCTCCCGGCCAAATSAGAGAGCCCAGGGGAAGCC3ATATCGCTGGCACAACC 
GTGGGACACGTGCGACCGGGAYAGSGAGGGCCGGTTTASTCTCTCGGGTCCCCTTCGCTATAGCGACCGTGTTGG GAGTC 



11600 



CTCAG 



H P V H 

11610, 



AGPXXPGQXR 
11620 11630 



R G 



nef 76-105 (184) 



11660 



11670 



11680 



GCCCATGACATATAAGGSCGCTRTTGACCTCAGCYTGTTTCTGAAAGAGAAAGGCGGACTGGAWGGCCTCRTCTATAGCM 
CGGGTACTGTATATTCCSGCGAyAACTGGAGTCGRACAAAGACTTTCTCTTTCCGCCTGACCTWCCGGAGYAGATATCGK 
PlITVKXAXDLSLFLKEKGGI.XGi:*XYS> 



spacers^ 



117X0 



11720 



vpr 1-30 (112) 



11750 



11750 



AGAA^ GCTGCT ATGGAACAGGCTCCCGAAGACCAARGCyCTCAGAGAGAGCCTTACAATGAGTGGRCCCTGGAGCTCCTG 
TCTT'^CGACGifTACCTTGTCCGAGGGCTTCTGGTTYCGRGAGTCTCTCTCGGAATGTTACTCACCYGGGACCTCGAGGAC 
MEQA PEDQXXQREPYNEWXL ELL> 



11770 



11780 



11790 



iiaoo 



11810 



pol 481-510 (66) ^^^^2 



GAAGAGCTCAACaiAM6AGGCTCAAGRCCAATGGACCTWCCAAATCTWTCAGGiU\CCCTTTAAGAATCTGAAAACCG^ 

cttctcgagttcktkctccgJgttcyggttacctggawggtttagawagtccttgggaaattcttagacttttggccttt 
eelkxea'qxqwtxqixqepfknlktg k> 



11350 



11850 



11870 



11880 



11890 



11500 



11910 



11920 



gtatkccagaawgagargcgctcacacaaacItggatgacagawaccctcctggtccagaatgccaatcccgattgcaagw 
catamggtcttwctctycgcgagtgtgtttgIacctactgtctwtgggaggaccaggtcttacggttagggctaacgttcw 
Y xrxrxa h tn*wmtx tllvqnanp d.c k> 



gag 316-345 (22) 



11950 



11960 



11970 



11980 



11990 



12000 



CCATCCTCARGGCTCTGGGAMCCGGAGCCWCACTGGAAGAdCCTGAGGTCATCCCTATGTTCWCAGCCCTCAGCGAAGGC 
GGTAGGAGTYCCGAGACCCTKGGCCTCGGWGTGACCTTCTqGGACTCCAGTAGGGATACAAGWGTCGGGAGTCGCTTCCG 
X ILXALG XGAX LEe'p EVI PMFXAL S EG> 



12010 gag 166-195 (12) 12040 



1205O 



12060 



12070 



12080 



gctaccccccaagacctgaataygatgctcaacaycgtcggcggacaccapJtccaccctccaggaacagattgsctggat 
cgatggggggttctggacttatrctacgagttgtrgcagccggctgtggttIaggtgggag gtc cttgt ctaac s gag ct a 
atpqdi.nxmi.nxvgghq'sti.qeqi xwm> 



12090 



12100 gag 241-270 (17) 12130 12140 



12150 



12160 



gacaartaaccctcccrtccctgtcggagasatttacaaaaggtggattatcctcggcctc 
ctgttyattgggagggyagggacagcctctstaaatgttttccacctaataggagccggac 

txnppxpvgxiykrwiilgltr 



t 

C1 

atcccccatcccg . , 
tagggggtagggc join 
I p H p> C2 



12170 



12180 



12190 



pol 241-270 (50) ^2220 



12230 



12240 



ccggcctcaagaaaaagaaaagcgtcaccgtcctggatgtgggagacgcttacttcagcgtccccctcgacraarrc caa 
ggccggagttctttttcttttcgcagtggcaggacctacaccctctgcgaatgaagtcgcagggggagctgyttyyggtt 
aglkkkksvtvi»dvgdayfsvpi*dxxq> 

12320 



12250 



12260 



12270 



12280 



pol 541-570 (70) ^2310 



arggaaacctgggagrcttggtggayggamtactggcaggctacctggatrcctgagtgggagtttgtgaatacccctcc 
tycctttggaccctcygaaccacctrcctkatgaccgtccgatggacctaaggactcaccctcaaacacttatggggagg 
xetwexwwxxywqatwipewefvntpp> 



12330 



12340 12350 123-60 12370 nef 121 -1 50 (1 87) 12400 



cctcgtg tttcccgattggcawaactatacccctggccctggcryaaggtatcccctcacctttggatggtgctttaagc 
gg agcaqaaagggctaaccgtwttgatatggggaccgggaccgyrttccataggggagtggaaacctaccacgaaattcg 
fpdwxnytpgpgxrypltpgwcpk> 



1m V 

124113 



12450 



12430 



12440 



12450 571-600 (72) 12480 



TCGTGCCTGTGGACCCCAAACTGTGGTACCAACTGGAAAAGGAMCCCATTGYCGGAGYCGAAACCTTTTACGTGGACGGA 
AGCACGGACACCTGGGG TTTGACACCATGGTTGACCTTTTCCTKGGGTAACRGCCTCRGCTTTGGAAAATGCACCTGCCT 
L-VPVDPK LWYQL EKX PXXGXETFYVDG> 
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12490 X2500 12510 12520 000 1 36-1 65 H 12550 12560 

* *.* ^ ^ + * 

GCCGCCARCAGAGAGACAAAGCTCGGcjcAAAACSYCCAGGGACAGATGGTGCATCAGSCTMTTAGCCCCAGGACCCTCAA 
CGGCGGTYGTCTCTCTGTTTCGAGCCaGTTTTGSRGGTCCCTGTCTACCACGTAGTCSGAKAATCGGGGTCCTGGGAGT']? 

AAXR E tki,g'qnx:qgqmvhqxxsprt I, N> 

1257D 12580 12590 12500 12610 enV 51-90 (141) 12640 

CGCTTGGGTCAAGGTCRTCGAAGAGAAAGSCTTTARCjGAMACCGAAGTGCATAACGTCTGGGCTACCCATGCCTGTGTGC 
GCGAACCCAGTTCCAGYAGCTTCTCTTTCSGAAATYaCTKTGGCTTCACGTATTGCAGACCCGATGGGTACGGACACACG • 
AWVK V XEEKXFX'XTEVHWVWATHA C V> 

12650 12660 12670 12680 12690 12700 12710 12720 

* * * * * * * * 

CTACCGATCCCAATCCCCAAGAGRTTSWCCTGGAGAATGTGACAGAqCTCAAGGATCAGMAAYTCCTCGGCMTTTGGGGA 
GATGGCTAGGGTTAGGGGTTCTCYAASWGGACCTCTTACACTGTCTqGAGTTCCTAGTCKTrRAGGAGCCGKAAACCCCT 

PTDPW pqexxlenvte'lkdqxxlgxwg> 



en V 375-404 {161) 12750 12760 12770 12780 12790 



12800 



TGCTCCGGCAAAMTCATTTGCACAACCRMTGTGCCTTGGAACAGCWCCTGGTCCAAqCMAKCTGGCCATAACAAAGT 
ACGAGGCCGTTTKAGTAAACGTGTTGGYKACACGGAACCTTGTCGWGGACCAGGTTqGKTMGACCGGTATTGTTTCACCC 

csGKx X cttxvpwnsxwsn'xxghwkvo 
12810 vif 136-165 (109) 12940 12850 12860 i287o 12880 

AAGCCTCCAGTATCTGGCTCTGAMGGCTCTGATTAMGCCTAAGAAAATCARACCCCCTCrGCCTAGCjGYTAAGACAATCA 
TTCGGAGGTCATAGACCGA6ACTKCCGAGACTAATKC6GATTCTTTTAGa*YTGGGGGA6ACGGATciRATTCT^^ 
SLQyLALXA LIXPKKIXPPLPslxKT I> 



12890 12900 en V 230-254 (152) 12930 spacers 



TTGTGCATCTGAATRAGTCCGTGGWAATCAATTGCACAAGGCCTARCAATAACACAAGGAMiGCCGCC 
AACACGTAGACTTAYTCAGGCACCWTTAGTrAACGTGTTCCGGATYGTTATTGTGTTCCTK^CGGCGG 
IVHLNX SVXXNCTRPXNNTRX 



12960 \ . 



•fGAAGWA 

:ttcwt join 
" " C3 



A S E X> 



12970 12980 12990 030 106-135 (8) 13020 13030 13040 

CAGAAWAAGTCCMAACAC3AAAACCCAGCAAGCCGCCGCCGATACAGGCARCTCCAGCMAGGTCAGCCAAAACTATCCCAT 
GTCTTWTTCAGGKTTGTCTTTTGGGTCGTTCGGCGGCGGCTATGTCCGTYGAGGTCGKTCCAGTCGGTTTTGATAGGGTA 
QXKSXQKTQQAAADTGXSSXVSQNYP I> 

13050 13060 13070 13080 po| 826-855 (89) 13110 13120 

TGTqTCCAACTTTACCTCCRCCRCTGTGAAAGCCGCTTGTTGGTGGGCCRRTATCMAACAGGAGTTTGGAATCCCTTACA 
ACAcbvGGTTGAAATGGAGGYGGYGACACTTTCGGCGAACAACCACCCGGYYATAGKTTGTCCTCAAACCTTAGGGAATGT 
V'SNFTSXXVKAACWWAXTXQEFGI PY> 

13130 13140 13150 13160 13170 pol 586-61 5 (73) 13200 

ATCCCCAAAGCCAJACATTCTATGTGGATGGCGCTGCCARTAGGGAAACCAAACTGGGAAAGGCTGGCTATGTGACAGAC 
TAGGGGTTTCGGTI TGTAA6ATACACCTACCGCGACGGTYATCCCTTTGGTTTGACCCTTTCCGACCGATACACTGTCTG 
NPQSQTF YVDGAAXRETKI.GKAGYVTD> 

13210 13220 13230 13240 13250 po| 766-795 (85) 13280 

AGAGGCAGACAGAAARTCRTTAGC GGAATCTGGCAGCTCGACTGTACCCATCTGGAAGGCAAARTCATTCTGGTAGCCGT 
TCTCCGTCTGTCTTTYAGYAATCC CCTTAGACCGTCGAGCTGACATGGGTAGACCTTCCGTTTYAGTAAGACCATCGGCA 
RGRQKXXSGXWQLDCTHLEGKXILVAV> 

13290 13300 13310 13320 13330 13340 13350 13360 

* ■* * ^ * * * * 

CCACGTCGCCTCCGGCTACATTGAGGCTGAGGTC GGCAATGAGCAAGTGGATAAGCTCGTGAKTKCCGGAATCAGAAAGG 
GGTGCAGCGGAGGCCGAT6TAACTCCGACTCCAG CCGTTACTCGTTCACCTATTCGAGCACTMAMGGCCTTAGTCTTTCC 
HVASGY .IEAEVGNEQVDKI*VXXGIRK> 

pol 691-720 (80) 13390 13400 13410 13420 13430 13440 

tgctattcctcgacggaatcrataaggctcaggaagagcacgaJgtcagggaaaggattaggcrarccsctcccgctgct 
acgataaggagctgccttagytattccgagtccttctcgtgctllcagtccctttcctaatccgytyggsgagggcgacga 
vlfl0g.i xkaqeehevr erirxxxpaa>- 



FIGURE 15 (Cont) 
SUBSTITUTE SHEET (RULE 26) 



wo 01/90197 



PCT/AUOl/00622 



77/216 



nef 16-45 (180) 



13470 



13480 



13490 



13500 



13510 



13520 



GAAGGCGTCGGCGCTGYCTCCCRGGATCTGGATAAGKACGGAGCCMTCACCTCC 
CTTCCGCAGCCGCGACRGAGGGYCCTAGACCTATTCMTGCCTCGGKAGTGGAGC 
EGV-GAXS XDLDKXGAXTS 



ACAAGCGGAACCCAACAGTCCCAGGG 
TGTTCGCCTTGGGTTGTCAGGGTCCC 
TSGTQQSQG> 



13530 rev 91-120 nSO^ 13560 13570 I358O 13590 13600 

AACTGAAACTGGCGTCGGCMRCCCTCAGATTTyGGGAGAGTCCAGCGYTRTCCTCGGCYCCGGCTCCATCGTCATCTGGG 
TTGACTTTGACCGCAGCCGKYGGGAGTCTAAARCCCTCTCAGGTCGCRAYAGGAGCCGRGGCCCAGGTAGCAGTAGACCC 
TBTGVGX P Q IXGESSXXLGXG'SIVXW> 



13610 13620 poi 526-555 (69) 



13 650 



13 660 



spacers 



GTAAAACCCCTAAGTTTARGCTCCCCATTCAGARAGAGACATGGGAARCCTGGTGGAYGGASTATTGGCAAGCC GCTGCT 
CATTTTGGGGATTCAAATYCGAGGGGTAAGTCTYTCTCTGTACCCTTYGGACCACCTRCCTSATAACCGTTCGG CGACGA 
GKTPKFXI.P3:QXETWEXWWXXyWQAAA> 



13690 



13700 



13710 env 140-169 (146) i374o 



13750 



13760 



tacagactgatcarctgtaacacaagcgytatcamacaggcttgccctaagrttasctttgascctatccctatccatta 
atgtctgactagtygacattgtgttcgcratagtktgtccgaacgggattcyaatsgaaactsggatagggataggtaat 

YRr.XXCNTSXIXQACPKXXFXPIPIHY> 



13770 

CTGTGCCCC 
GACACQGGG ^WM 
C A 



13780 



13790 



13800 pol 376-405 (59) 13830 X3840 



C3 



^tggatgggctatgagctccaccctgacagatggacagtgcaacccatcswgctccccgaaaagg 

acctacccgatactcgaggtgggactgtctacctgtcacgttgggtagswcgaggggcttttcc join 

C4 



ppswmgyel.hpdrwtvqpixlpek> 

13850 13860 13870 138B0 13890 030 331-360 (23) 13920 

* It . * *■ * ic 

astcctggacagtgaatgacattcaqaaawcaattctgaragccctcggcmcaggcgctwccctggaggaaatgatgaca 
tsaggacctgtcacttactgtaagtcfrttwgttaagactytcgggagccgkgtccgcgawgggaccojcctttac 



XSWTVNDIQ'KX 
13930 13940 13950 



I L X A L G X 
13960 13970 



;axleemmt> 

13980 13990 14000 



GCATGTCAGGGAGTGGGAGGCCCTRGCCATAAGGCa AGAGTGTATTACAGAGACTCCAGGGACCC CMTTTGGAAAGGCC C 
CGTACAGTCCCTCACCCTCCGGGAYCGGTATTCCGA TCTCAC ATAATGTCTCTG AGGTCCCTGGGGKAAACCTTTC C GGG 
ACQGVGGPXHKARVYYRDSRDPXWKG P> 

14030 14040 



pol 931-960 (96) 



14050 



14060 



14070 



14060 



TGCCAAACTGCTCTGGAAAGGCGAAGGCGCTGTGGTCATCCAAGACRTTAAGATTGGAGGCCAACTGAWAGAAGCCCTCC 



ACGGTTTGACGAGACCTTTCCGCTTCCGCGACACCAGTAGGTTCTG 
AKLLWKGEGAVVIQD 



YAATTCTAACCTCCGGTTGACTWTCTTCGGGAGG 
XKXGGQLXEAL> 



14090 



pol 61-90 (38) 



14120 



14130 



14140 



14150 



14160 



TGGATACAGGAGCCGATGACACCGTCCTGGAAGAWATSAATCTGCCTGGCARGTGG GGAATCAAACAGCTCCAGGCTAGG 
ACCTATGTCCTCGGCTACTGTGGCAGGACCTTCTWTASTTAGACGGACCGTYCACC CCTTAGTTTGTCGAGGTCCGATCC 
LDTGADDTVLEXXNLPGXW 



14170 14180 en V 360-389 (160) i42io 



6 I 
14220 



K Q I* Q A R> 

spacers^ 



GTCCTGGCTRTCGAGAGGTATCTGAMGATCAAMAGYTTCTGGGAMTCTGGGGCTGTAGCGGAAAC GCTGCI ATGGAAAA 
CAGGACCGAYAGCTCTCCATAGACTTTCTAGTTKTCRAAGACCCTKAGACCCCGACATCGCCTTTC CGACG^(tACCTTTT 

vlaxerylkdqxxlg'xwgcsgk a a 



14250 



14260 



14270 



vif 1-30 (100) 



14300 



14310 



M E N> 
14320 



cagatggcaagtgmtgatcgtctggcaagtggacaggatgargattaggacatggaawagcctcgtgaaacaccatatgy 
gtctaccgttcac kactagcagaccgttcacctgtc ctact yc taatcctgt ac cttwtcgga6c actttgtggtat acr 

RWQVX IVWQVDRMXIRTWXSLVKHHM> 



14330 



14340 



14350 14350 en V 390-41 9 (1 62) ^^390 



14400 



aimttatctgtaccacarmcgtcccctggaactccasctggagcaataagtccytcgaagagatttggrataacatgacc 
t^kaatagacatggtgtykgcaggggaccttgaggtsgacctcgttattcaggragcttctctaaaccytattgtactgg 

XXICTTXVPWNSXWSNKSXEEI WXWMT> 
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14410 14420 14430 VDU 16-45 (133) 14460 14470 14480 

*j* * * * * 

TGGATKSAATGdCTGATTMTCGCTATCGTCGTGTGGACCATTGYGTWTATCGAATACARGAAACTGCTCARGCAAAGGAR 
ACCTAMSTTACdGACTAAKAGCGATAGCAGCACACCTGGTAACRCAWATAGCTTATGTYCTTTGACGAGTYCGTTTCCTY 
WX:Xw'LIXAtVVWTIXXIEYXKi:,LXQRX> 

14490 14500 14510 14520 qaq 46-76 " 14550 14560 

AATCGATAGGCTCATCRAAAGG CTCAACCCTGGCCTCCTGGAAACCKCTGAGGGATGTMAACAGATCCTGGRACAGCTCC 
TTAGCTATCCGAGTAGYTTTCC GAGTTGGGACCGGAGGACCTTTGGMGACTCCCTACAKTTGTCTAGGACCYTGTCGAGG 



I D R t, I X .R 



IiNPGLLETXEGCXQXLXQL> 



14570 14580 14590 14600 14610 14620 14630 



14640 



AGYCCGCCCTCMAGACAGGCWCCGAAGAGCT 
TCRGGCGGGAGKTCTGTCCGWGGCTTCTCGAC 



i * * * C4 

|AGAAAGCTCCrGARACAGAGAARGATTGACAGACTGATTRAG . . 
gTCTTTCGAGGACTYTGTCTCTTYCTAACTGTCTGACTAAYTC JO I H 



QXALXTGXEEI.SSRKLLXQRXIDRI.IX> C5 
VpU 31-60 (134) 14670 14680 14690 14700 14710 14720 



AGAAYCAGAGAGAGAGCCGAAGACTCCGGCAATGAGTCCGAGGGAGAckcACCCGGAATCAGATACCAATACAATGTGC'r 
TCTTRGTCTCTCTCTCGGipTTCTGAGGCCGTTACTCAGGCTCCCTCTqTGTGGGCCTTAGTCTATGGTTATGTTACACGA 
RX RERAEDSGNESEGD^TPGI RYQYWVL> 

14730 POI 286-315 f53) 14760 14770 14780 14790 14800 

* ^ ^ * * ■* *r * 

CCCCCAAGGCTGGAAGGGCTCCCCASCCATTTTCCAAAGCTCCATGMCCMAAATCCTckTGATGCAAAGGGGAAACTTTA 
GGGGGTTCCGACCTTCCCGAGGGGTSGGTAAAAGGTTTCGAGGTACKGGKTTTAG6AOTACTACGTTTCCCCTTTGAAAT 

pqgwkgsp xxfqssmxxil'mmqrgnp> 

14810 14B20 gaq 376-405 f26) ^^SSO 14860 14870 14880 

* * ^ ^ J * ^ ^ ^• 

RGGGACMGAAAAGGATTRTCAAGTGCTTCAACTGTGGAAAGGAAGGCCATMTCGCTARGAATTGCAGacCTCCCCTGGAG 
YCCCTGKCTTTTCCTAAYAGTTCACGAAGTTGACACCTTTCCTTCCGGTAKAGCGATYCTTAACGTCaGGAGGGGACCTC 
XGXKRIXKCFNCGKEGHXAXWCRPPLE> 

14890 .14900 14910 re V 76-105 (129) l^SSO 14960 

agactgmacctggattgctccgaggatwgcgrcacctccggcacacagcaaagccaaggcacagagacaggagtgggJct 
tctgacktggacctaacgaggctcctawcgcygtggaggccgtgtgtcgtrtcggttccgtgtctctgtcctcacccdga 
rlxldcsedxxtsg3?qqsqgtetgvg'l> 

14970 14980 14990 15000 pol 781-810 (86) ^^030 15040 

cgtggctgtgcatgtggccagcggatatatcgaagccgaagtgatccctgccgaaactggacaggaaaccgcttactttm 
gcaccgacacgtacaccggtcgcctatatagcttcggcttcactagggacggctttgacctgtcctttggcgaatgaaak 

VAVHVAS GYIEAEVIPAETGQETAYF> 
15050 15060 15070 15080 15090 etlV 200-229 fl 50) 15X20 

I* * * * * -V ■/e 

TCCTCAA^TTARGCCTGTGGTCAGCACACAGCTCCTGCTCAACGGTAGCCTCGCTGAAGAGGAARTCRTTATCAGAAGC 
AGGAGTTCjrAATYCGGACACCAGTCGTGTGTCGAGGACGAGTTGCCATCGGAGCGACTTCTCCTTYAGYAATAGTCTTCG 
XLK*XXPVVSTQLi:.I,WGSLAEEEXXIRS> 

15130 15140 15150 15160 15170 pol 406-435 (61) ^^200 

GAAAACYTTACCRATAACAAACTGGTCGGCAAACTGAATTGGGCTTCCCAAATCTACSCTGGCATCAAAGTGARGCAACT 
CTTTTGRAATGGYTATTGTTTGACCAGCCGTTTGACTTAACCCGAAGGGTTTAGATGSGACCGTAGTTTCACTYCGTTGA 
ENXTXNKr.VGKLNWASQIYXG IKVXQL> 

15210 15220 15230 15240 15250 GHV 121 -1 39 (145) 15280 

GTGTAAGCTCCTGAGAGGCRCCAAAGCCCTCACCCCTCTGTGTGTGACACTGAATTGCACAAACGCTAACCTCATCAATG 
CACATTCGAGGACTCTCCGYGGTTTCGG GAGTGGGGAGACACACACTGTGACTTAACGTGTTTGCGATTGGAGTAGTTAC 
CKLLRGXKALTp'riCVTLNCTNANLXW> 

pacers 15310 15320 15330 -ro ^r^o /^^^x 15360 



ACTT/ 
V N 



tat 76-102 (123) 



TGAA-J GCTGCT CAAMCCAGAGGCGATAACCCTACCGRTCCCRAAGAGTCCAAGAAARAGGTCGMGTCCAAGRCAGAGACA 
" CGACG/GTTKGGTCTCCGCTATTGGGATGGCYAGGGYTTCTCAGGTTCTTTYTCCAGCKCAGGTTCYGTCTCTGT 
AAQXRG DNPTXPXESKKXVXSKXET> 



FIGURE 15 (Cont) 
SUBSTITUTE SHEET (RULE 26) 



wo 01/90197 



PCT/AUOl/00622 



79/216 



spacers 



15390 



15400 



GACCCTTKTGAC GCCGC 
CTGGGAAMACTqCGGCGG 
D P X D 



rev 61-90 (128) 



15430 



15440 



CCCAMCTKTCTGGGAAGGYCTGCCGAACCCGTCCCCCTCCAGCTCCCCCCTCTGGA 
VGGTKGAMAGACCCTTCCRGACGGCTTGGGCAGGGGGAGGTCGAGGGGGGAGACCT 
PSSXXLGRXAEPVPLQLPPX*E> 



I 



15450 



154S0 



154'?0 



15480 



15490 



15500 



15510 



15520 



AAGGCTCMACCTCGACTGTAGCGAAGACWGTGRC GMACTGGATAAGTGGGCCTCCCTGTGGAACTGGTTCRATATCWCCA 
TTCCGAGKTGGAGCTGACATCGCTTCTGWCACYC CKTGACCTATTCACCCGGAGGGACACCTTGACCAAGYTATAGWGGT 
R LXLDCS EDXXXLDKWASLWNWFXI X> 



15550 



155S0 



15570 



15580 



15590 



15600 



env 450-479 (166) . * * . * 

ASTGGCTGTGGTACATTAAGATTTTCATTATGATTGTGGGAGGgAATAAGATTGTCAGGAT6TACyMACCTGTCTCCAT.C 
TSACCGACACCATGTAATTCTAAAAGTAATACTAACACCCTCCGTTATTCTAACAGTCCTACATGRICTGGACAGAGGTAG 
XWI*WYIKIFIM1VGG*NKIVRMYXPVSI> 



X W I* 

15610 



15640 



15650 



15660 



15 670 



15680 



gag 271-300 (19) 

ctcgacattargcaaggccctaaggaacccttcagggattacgtggacagattcgctaagctcctgtggaagggagaggg 
gagctgtaatycgttccgggattccttgggaagtccctaatgcacctgtctaag cgattcgaggacaccttccctctccc 
Ldixqgpkepfrdyvdrfakli*wkgeg> 

15690 15700 pol 946-975 (97) ^^"^^^ ^^'^^^ ^^"^^^ 

agccgtcgtgattcaggacaactccgacattaaggtcgtgcccaggagaaaggctaagattatc gaactgaataagagaa 
tcggcagcactaagtcctgttgaggctgtaattccagcacgggtcctctttccgattctaatagcttgacttattctctt 
a'vv'x qdwsdikvvp rrkakii " " 



15770 



15780 



15790 



pol 226-255 (49) 



15820 



E I* N K R> 

spacers 



cccaagacttttgggaagtgcaactgggaatccctcaccctgctggactgaaaaagaaaaagtccgtgacagtc gccgct 

GGGTTCTGAAAACCCTTCACGTTGACCerTAGGGAGTGGGACGACCTGACTTTTTCTTTTTCAGGCACTGTCAC CGGCGf 
TQDFWEVQLGI PHPAGLKKKKSVT V | A A > 



15850 



15860 



15870 



15880 



15910 



15920 



env 1-30 (137) 

ATGAGAGTGAAAGAGACACAGATGAACTGGCCCAATCTGTGGARGTGGGGCACAMTGATTCTGGGAMTGGTCATSATTTG 

tactctcactttctctgtgtctacttgaccgggttagacacctycaccccgtgtkactaagaccctkaccagtastaaac 

MRVKETQMNWPNLWXWGTXILGXVX IC> 



15930 15940 15950 15960 15970 po| 421-450 (62) 

* * * * * ^ 



16000 



gaggcggagg 

S A S 



ctccgcctccattaaggtcaracagctctgcaaactgctcaggggtrcaaaggctctgacagasattgtgmcactgacag 



TAATTCCAGTYTGTCGAGACGTTTGACGAGTCCCCAYGTT'I^CCGAGACTGTCTSTAACACKGTGACTGTC 
IKVXQI>CKLLRGXKALTXXVXLT> 



16010 16020 

1 

AGGAAGCCGAACTGGAACTG 
TCCTTCGGCTTGACCTTGAC 
E E A E I. E L 



16030 16040 16050 ncf 1 81 -1 96 (1 91 ) 16080 

* * * * 

CTCAWATGGAAGTTTGACTCCCRCCTCGCCCKGAGACATATSGCCAGGGAACTGCRTCCC 
GAGTWTACCTTCAAACTGAGGGYGGAGCGGGMCTCTGTATASCGGTCCCTTGACGYAGGG 
LXWK F DSXLAXRHXARELXP> 



16090 



spacers 



16120 16130 env 570-599 (174) 



16160 



GAGTWCTACAAAGACTGC GCTGCIGTCGAGCTCCTGGGACRCTCCAGCCTCARGGGACTGCRAAGGGGATGGGAAGSCCT 
CTCAWGATGTTTCTGACGCGACGi^CAGCTCGAGGACCCTGYGAGGTCGGAGTYCCCTGACGYTTCCCCTACCCTTCSGGA 
EX YKDCAAVELriGXSSLXGLXRGWKXI.> 



16240 



16170 16180 16190 16200 16210 16220 16230 

* * * * * rfi 

CAAGTATTKGKGGAACCTCCTGCWGTATTGGGGC '^^^ CTGGRGCAACTGCAAYCTGCTCTGMAAACCGGAWCAGAGG 
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/* Scramble */ 

/* Includes */ 

#include <stdio.h> 
#include <stdlib.h> 
#include <string.h> 
#include <tlme.h> 

/* Constant definitions */ 

/* Version Information */ 
#deffne VERSlON_NO 
#define VERSION^DATE 

/* Misc */ 

#define KEYBOARD_BUFFER_SI2E 
#define LEN_CODON 
null) V 

#define BUFFER_SI2E 
#define TRUE 
#define FALSE 

/* Error codes */ 

#define E^NOERROR 

#define E_NOINFILE 

#define E_MALLOC 

#define E„FILEREAD 

#define E_CREATE_OUTPUT_FILE 

#define E_OVERLAP 

/* Structure definitions */ 
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"0.2" 
"04/03/1999" 



256 



0 
1 
2 
3 



4 



/*size of keyboard read buffer */ 

/*length of codon (including 

1 0000 /*si2e of file read buffer */ 
1 /*boolean true */ 

0 /*boolean false */ 



/*no error */ 

/*genes file not found 7 
/*memory allocation error */ 
/*file read error 7 
/*error creating output file 7 

/*segment overlap >= length 



typedef struct gene GENE; 
typedef GENE * P_GENE; 

typedef struct gene_segment GENE_SEGMENT; 
typedef GENE_SEGMENT * P_GENE_SEGMENT; 
struct gene { 

char * name; 

char * data; 

P_GENE next_gene; 

}; 

struct gene_segment { 

P_GENE p_gene; 
int number; 
int offset; 

int flrst_codon_choice; 
char * amino_data; 
char * dna_data; 
P_GENE_SEGMENT next^seg; 

}» 



Figure 25 



wo 01/90197 



PCT/AUOl/00622 



/* Function prototypes */ 96/216 

int prologO; 

int get_parameters(); 

int read Jnt(char * prompt); 

int load_genes(); 

int add_gene(cliar * gene_name,char * gene_data); 

void insert_gene(P_GENE * head,P_GENE new_gene); 

int add_aa(); 

int split__genes(); 

int split_gene{P_GENE g); 

int insert__segment(P__GENE_SEGIVlENT * head_seg,P_GENE_SEGI\/IENT new_seg); 
int convert_segments_aa_to_dna(); 

int convert_aa_to_dna(cliar * aa_ptr,char * dna_ptr,lnt first_cholce); 

char * codon(char acid_charjnt preferred); 

int perform_scramble(); 

int scramble_segments(); 

int adjacent_segments(); 

int display__genes(); 

int write_output_fiie(); 

void strip_newllne(cliar * strip_str); 

void pad_amino___string(char * annino_ptr, char * padded_ptr); 

Int even(int test_nunn); 

void read_str(char * prompt,char * string); 

char * read_nonblankJine(char * buf.int buf_size,FILE * in_file); 

int user_confirnnation(); 

void test(); 

/* Global variables */ 

char * codon_table[26][2] = { 
/* A 00 */ {"GCC","GCT"}, 
/* - 01 */ {"???","???"}, 
/* C 02 */ {"TGC","TGT"}. 
/* D 03 */ {"GAG","GAT"}, 
/* E 04 */ {"GAG","GAA"}, 
/* F 05 */ {•TTC","TTT"}, 
/* G 06 */ rGGC","GGA"}, 
/* H 07 */ {"CAC","CAT"}, 
/* I 08 */ {"ATG","ATT"}, 
/* - 09 */ {"???","???"}, 
/* K 1 0 */ {"AAG'VAAk*}, 
/*L11 */{"CTGVCTC"}, 
/*M 12*/{"ATG","ATG"}, 
/*N 13*/{"AAC","AAT"}. 
/* - t4 */ {"???","???"}, 
/* P 15 */ {"CCC","CCT"}. 
/* Q 16 */ {"CAG","CAA"}, 
/* R 17 */ {"AGG","AGA"}, 
/*S 18*/rAGC";TCG"}, 
/*T19*/{"ACC","ACA"}, 
/* - 20 */ {"???","???"}, 
/* V 21 */ {"GTGV'GTC"}. 
/* W 22 */ rrGG^'/TGC'-}. 
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/* - 23 */ {"???","???"}, 

/* Y 24 */ {"TAC","TAT"}, 97/216 
/* - 25 */ {"???","???"} 

}; 

char * erroMextn = { 

/* 00 7 

/* 01 V ."ERROR: Input file not found!" 

/* 02 7 /'ERROR: Memory allocation error" 

/* 03 7 ,"ERROR: File read error" 

/* 04 7 /'ERROR: Could not create output file" 

/* 05 7 /'ERROR: Segment overlap must be less than segment length" 
}> 

chardisease_name[KEYBOARD_BUFFER_SIZE]; 
char inputJile_name[KEYBOARD_BUFFER_SIZE]; 
char output_file_name[KEYBOARD_BUFFER_SI2E]; 
int num_genes = 0; 
int num__segments = 0; 
int len_segment; 
int segment_overlap; 
P_GENE first_gene = NULL; 
P_GENE„SEGMENT flrst_segment = NULL; 
P_GENE_SEGMENT * scrambled^segments = NULL; 

/* Mainline 7 

void maln() { 

Int error = E_NOERROR; 

printf("Scramble - Version %s. %s\n\n",VERSION_NO.VERSION_DATE); 

/* Initial processing 7 
if (lerror) 

error = prolog(); 

/* Get various program parameters from user 7 
if (lerror) 

error = get_parameters(); 

/* Load genes from genes file 7 
If (lerror) 

error = load_genes(); 

/* Add *AA' to start and end of all genes 7 
if (lerror) 

error = add_aa(); 

/* Split genes into overlapping chunks 7 
if (lerror) 

error = spllt_genes(); 

/* Convert segment amino acid to dna 7 
if (lerror) 

error = convert_segments_aa_to_dna(); 
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/* Scramble the segments */ 



98/216 



if (ierror) 

error = perform_scramble(); 

/* Write output file */ 
if (terror) 

error = write_output_file(); 

/* Show error if there was one */ 
If (error) 

printf("%s\n",error_text[error]); 



/* prologO */ 

/* Perform any initial processing required */ 
int prologO { 



/* Seed the random number generator, using the system clock */ 
/* Don't run the program more than once in the same second! */ 
/* Or we'll get the same randomlsationi!!!!!!!!!!!!!!!!!!!! */ 
srand(time(NULL)); 

return E__NOERROR; 



/* get_j)arameters() */ 

/* Ask for various parameters from the user (stdin) */ 



read_str("Enter disease name : ",dlsease_name); 
read_str("Enter input file name : ",input_flle_name); 
read_str("Enter output file name : ",output_Jiie_name); 

valid = FALSE; 
while (Ivalid) { 



len__segment = read Jnt("Enter segment length : ")i 
if (len_segment % 2) 



printf("Segment length must be evenl\n"); 



/* Disease name 

/* Input file name 

/* Output file name 

/* Segment length 



*/ 
*/ 



*/ 



int get_parameters() { 
int valid; 




return E_NOERROR; 



/* load_genes() */ 
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/* Load the genes from the input file */ 99/216 

int load_genes() { 

FILE * input Jiie; 
char name_buf[BUFFER_SIZE]; 
char data__buf[BUFFER_SIZE]; 
int rc; 

/* Open genes file for reading */ 
if (NULL == (inputjie = fopen(input_file_name,V))) 
return E_NOINFILE; 

printf("Loading genes from: %s\n",input_file_name); 

num_genes = 0; 

/* Read gene name */ 

while (NULL !- read_nonblankJine(name_buf,BUFFER_SIZE,input_file)) { 
/* Read the gene data */ 

if (NULL 1= read_nonbiankJine(data_buf,BUFFER_S!ZE,input_fiie)) { 
/* Allocate memory for new gene and add to list */ 
if (rc = add_gene(name_buf,data_buf)) 
break; 

} 

} 

/* Close genes file */ 
fGiose(input_file); 

retum rc; 

} 

/* add_gene() */ 

/* Allocate memory for new gene, then insert in list */ 

int add_gene(char * gene_name,char * gene_data) { 
P_GENE new_gene; 

/* Allocate storage for new gene */ 

if (NULL == {new_gene = mal!oc(sizeof(GENE)))) 

return E_MALLOC; 
/* Initialise new gene */ 
new_gene->next_gene = NULL; 
/* Allocate storage for gene name (+1 for null) */ 
if (NULL == (new_gene->name = malloc(strlen(gene_name)+1))) 

retum E^MALLOC; 
/* Store gene name */ 
strcpy(new__gene->name,gene_name); 
/* Allocate storage for gene data (+1 for null) 7 
if (NULL — (new_gene->data = malloc(strlen(gene_data)+1))) 

return E_MALLOC; 
/* Store gene data 7 
strcpy(new__gene->data,gene_data); 
/* insert the new gene into linked list 7 
insert_gene(&first_gene,new_gene); 
/* Increment num^genes 7 
num_genes++; 
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return E_NOERROR; 100/216 

} 

/* insert__gene() */ 

/* Insert gene into linked list */ 

void insert_gene(P_GENE * head„gene,P_GENE new^gene) { 
P_GENE * curjpir = head_gene; 

wliile (NULL 1= (*cur_ptr)) 

cur_ptr = &{{*cur_j>tr)->next.gene); 

*curjptr = new gene; 

} 

/* add_aa() */ 

/* Add 'AA' to the start and end of every gene */ 

int add_aa{) { 

P_GENE cur_gene = first_gene; 
ciiar * new_data; 

while (NULL != cur_gene) { 

/* Allocate storage to fit the gene plus four characters */ 

new_data = maHoc(strten(cur_gene->data)+5); 

/* Shift gene data to new storage, add "AA" */ 

strcpy(new_data,"AA"); 

strcat{new_data,cur_gene->data); 

strcat(new_data,"AA"); 

/* Free previous gene data storage */ 

free(cur_gene->data); 

/* Set gene data pointer to new storage */ 

cur_gene->data = new_data; 

/* Advance to next gene */ 

cur__gene - cur_gene->next_gene; 

} 

return E_NOERROR; 

} 

/* split_genes() */ 

/* Split the genes into overlapping segments */ 

int split_genes() { 

P_GENE cur_gene = first_gene; 
P__GENE_SEGMENT curjseg = firstjsegment; 



printfC'Splitting genes into segments...\n"); 

/* Split the genes into segments */ 

while (NULL != cur^gene) { 
/* Split the gene 7 
sp)it_gene(cur_gene); 
/* Advance to next gene */ 
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cur_gene = cur_gene->next_gene; 



/* Count the number of segments */ 
num_segments = 0; 
cur_seg = first_segment; 
while (NULL != cur__seg) { 

num_segments++; 

cur_seg = cur_seg->next_seg; 



return E^NOERROR; 

} 

/* split_gene() */ 

/* Split a gene Into overlapping segments */ 

Int split_gene(P_GENE g) { 
char * seg_ptr; 
char * seg_buf; 

P_GENE_SEGMENT new_segment = NULL; 

int done; 

int seg_ctr = 0; 

/* Allocate memory for segment buffer */ 
if (NULL == (seg_buf = malloc(len_segment+1))) 
return E_MALLOC; 

/* Insert a null at the end of the segment buffer, */ 
/* so we can use it as a string */ 
seg_buf[len_segment] = '\0*; 

/* Set segment pointer to start of gene data */ 
seg^ptr = g->data; 

done = FALSE; 
while (!(done)) { 

/* So we know if we copied data */ 

seg_buf[0] = '\0'; 

/* Copy a segment of gene data to the segment buffer */ 
memcpy(seg_buf,seg_ptr,len_segment); 

/* If there was some gene data copied to the buffer */ 
if {NULL != seg_buf[0]) { 

/* Allocate storage for a new segment */ 

if (NULL == (new_segment = malloc(sizeof(GENE_SEGMENT)))) 

return E_MALLOC; 
/* Increment segment counter */ 
seg_ctr++; 

/* Setup the new segment 7 
new_segment->p_gene = g; 
new_segment->number = seg_ctr; 
new__segment->offset = seg_ptr - g->data + 1 ; 
new_segment->next_seg = NULL; 
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if (NULL == (new_segment->amino_data = maiIoc(len_segment+1))) 

return E_MALLOC; 
if (NULL == (new_segment->dna_data = malloc(len_segment*3+1))) 

return E^MALLOC; 
new_segment->annino_data[0] = '\0*; 
new_segment->dna__data[0] = '\0'; 
/* Copy segment data from buffer to new segment */ 
strcpy(new_segment->amino_data,seg_buf); 
/* Insert new segment into chain from gene */ 
insert__segment(&first__segment,new_segment); 



/* If we didn't read a full segment, we are finished! */ 
if (strlen(seg_buf) < [en_segment) 
done = TRUE; 

/* Otherwise, advance segment pointer to next segment In buffer */ 
else 

seg_ptr = seg_ptr + len_segment - segment_overlap; 

} 

/* insert_segment() */ 

/* Insert a segment node at the end of the list 7 

int insert_segment(P_GENE_SEGMENT * head_seg,P_GENE_SEGMENT new_seg) { 
P_GENE_SEGMENT * cur_ptr = head_seg; 

while (NULL != (*cur_ptr)) 

cur _ptr = &((*cur J3tr)->next_seg); 

*cur_ptr = new_seg; 

} 

/* convert_segments_aa_to_dna 7 

/* Go thru segments, and for each, convert amino acids to dna 7 

int convert_segments_aa_to_dna() { 

P_GENE_SEGMENT cur_seg = first_segment; 
int first_cholce = 1; 
int alternate; 

printf<"Converting to DNA...\n"); 

/* Work out If we need to alternate the first codon choice or not 7 
/* Don't need to do this anymore, since the segment length Is 7 
/* forced to be even, and the overlap Is half the length (odd). 7 
/*alternate = ((even(len_segment) && even(segment_overlap)) 

II (!even(len_segment) && leven(segment_overlap)));7 

alternate = FALSE; 

while (NULL != cur_seg) { 

cur_seg->first_codon_cholce = first_choice; 
convert_aa_to_dna(cur__seg->amino_data,cur_seg->dna_data, 

cur_sdS->first_codon_choice); 
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/* Address next segment */ 
cur_seg = cur_seg->next_seg; 

/* If we are alternating, alternate the first codon choice */ 
/*if (alternate) 

if (1 == first_choice) 

first_choice = 2; 

else 

first^choice = 1;*/ 



return E_NOERROR; 

} 

/* convert_aa_to_dna */ 

/* Converts a string of amino acid to dna */ 

/* NOTE: assumes that buffer at dnaj3tr is large enough to hold dnall! */ 

int convert_aa_to_dna(char * aa_ptr,char * dna j)tr,int first__choice) { 
char * p_codon; 
int cur ^preferred = first_choice; 

while CXO' N *aa_ptr) { 

p_codon = codon (*aa_ptr,cur_preferred); 

strcat(dna_ptr,p__codon); 

/* If we didn't find a codon, log a warning */ 

if (0 === strcmp(p_codon,"???\0")) 

printffWARNING: no codon found for amino acidl\n"); 

/* Alternate current preferred codon */ 
if (1 == cur_preferred) 

curjreferred = 2; 

else 

cur jDreferred = 1 ; 

aa _j3tr++; 

} 

return E__NOERROR; 

} 

/* codon */ 

/* Returns a pointer to a codon corresponding to the amino acid passed */ 
/* The codon pointer is to 3 characters, plus a terminating null */ 

char * codon(char acid_char,int preferred) { 
int codon_tableJndex; 
char * codon_ptr; 

/* Determine index into codon Jable (table starts at W) */ 
codon_tableJndex = acid_char - W; 

/* Set pointer to appropriate codon */ 

codonjDtr = codon_table[codon_tableJndex][preferred-1 ]; '1 
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} 



return codonjtr; 



/* display_genes() */ 

/* Display the name and data for all genes */ 

int dlsplay_genes() { 

P_GENE cur_gene = flrst_gene; 

while (NULL 1= cur_gene) { 

printf("%s\n",cur_gene->name); 
prlntf("%s\n",cur_gene->data); 
cur_gene = cur_gene->next_gene; 

} 

return E_NOERROR; 

} 



/* perform_scramble() */ 
/* Scramble the segments */ 

/* Check for adjacent segments. If there are, rescramble */ 

int perform_scramble() { 

int done = FALSE; 
int rc = E_NOERROR; 

while (TRUE) { 

rc = scramble_segments(); 
If (E^NOERROR == rc) 

if (adjacent_segments()) { 

printf("Adjacent segments detected! Rescramble? (y/n) "); 
if (!user__confirmation()) { 

printf ("WARNING: Adjacent segments In output 



fileAn"); 



else 



} 



} 

return rc; 



} 

else 



break; 



} 

break; 



break; 



/* scramble_segments() */ 

/* Randomly scramble the segments, putting pointers in scrambled_segmentsD */ 

int scramble_segments() { 

P_GENE_SEGMENT cur_seg = first_segment; 
int i,j; 

P_GENE_SEGMENT temp; 
printf("Scrambiing segments. .An"); 
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/* /^locate storage for array of segment poMers */ 

if (NULL == (scrambled_segments = malIoc(sizeof(P_GENE_SEGMENT)*num_segments))) 
return E_MALLOC; 

/* First, initialise scrambled__segments In same order as linked list */ 
i = 0; 

while (cur_seg 1= NULL) { 

scrambled_segments[i] = cur_seg; 
cur_seg = cur_seg->next_seg; 

} 

/* Now, randomly scramble the segments */ 
for (i=0;i<num__segments;i++) { 

j = rand() % num_segments; 

temp = scrambied_segments[i]; 

scrambled_segments[i] = scrambled^segmentsQ]; 

scrambled^segmentsOi = temp; 

} 

return E_NOERROR; 

/* adjacent_segments() */ 

/* Determine If the scrambled segment order has resulted in */ 
/* two segments which were adjacent originally (ie every */ 
/* second one) have ended up adjacent. */ 

int adjacent_segments() { 
int i; 

int rc = 0; 

P_GENE_SEGMENT cur^seg; 
P_GENE_SEGMENT next^seg; 

for (i=0;i<num_segments-1;i++) { 

/* Address current and next segments */ 

cur_seg = scrambled_segments[i]; 

next_seg = scrambled_segments[l+1]; 

/* Do segments come from same gene, and are two apart? 7 

if (((cur__seg->p_gene == next_seg->p_gene) 

&& ((cur_seg->number == (next_seg->number)+2) 

II {cur_seg->number == {next_seg->number)-2)))) 

return 1; 

} 

return 0; 

} ^ 
/* write_output_file() 7 

/* Write out segments (in initial non-scrambled order) 7 
/* Write out synthetic protein (in scrambled order) 7 
/* Write out synthetic dna (in scrambled order) 7 

int write_output_file() { ^ 
FILE * outputjile; 
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char * amlno_buffer; 
P_GENE__SEGMENT cur_seg; 
int i; 

/* Open output file for writing (erase any contents) */ 
if (NULL (outputjie = fopen(output_file_name,"w"))) 
return E_CREATE_OUTPUT_FILE; 

/* Allocate memory for padded amino string buffer */ 
if (NULL == (amino__buffer = malloc(len_segment*3+1))) 
return fc^MALLucj; 

printf("Writlng output file: %s\n",output_file_name); 

/* Write output file header Information 7 

fprintf(output__file,"ScrambIe %s - Output File\n",VERSION_NO); 
fprintf(output_file,"\n"); 

fprlntf(output_flIe/'Disease name : %s\n",disease_name); 
fprlntf(output_fi!ef^'lnput filename : %s\n",input_file_name); 
fprintf(output_file,"Output filename : %s\n",output__fi!e_name); 
fprintf(output_file/'Number genes : %d\n",num_genes); 
fpnntf(output_file,"Number segments : %d\n",num_segments); 
fprintf(outputJlle,"Segment length : %d\n"Jen_segment); 
fprintf(output_flle,"Segment overlap : %d\n",segment_overlap); 

/* Write out segments In initial non-scrambled order 7 
fpr]ntf(output_file,"\n"); 

fprintf(output_file,"Segments In original order:\n"); 

fprlntf(output_file," -\n"); 

cur_seg = first_segment; 
while (NULL != cur_seg) { 

/* Format amino data to line up with codons 7 

pad_amino_string(cur__seg->amino_data,amino_buffer); 

fprintf(output_fjle,"Gene : %s\n",cur_seg->p_gene->name); 

fprintf(output_file,"Segment# : %d\n",cur_seg->number); 

fprintf(output_file,"Offset : %d\n",cur_seg->offset); 

fprintf(output__file,"1st Codon : %d\n",cur_seg->first_codon__choice); 

fprintf(output_file,"%s\n",amlno_buffer); 

fprintf(output_Jlle,"%s\n",cur_seg->dna_data); 

fprintf(output_file,"\n"); 
cur_seg = cur_seg->next__seg; 

} 

/* Write out segment names in scrambled order 7 
fprintf(output_file,"Segments in scrambled order:\n"); 

fprintf(output__flle," ^-\n"); 

for (l=0;i<num_segments;i++) { 

/* Format amino data to line up with codons 7 

pad_amino_string(scrambled_segments[i]->amino_data,amino__buffer); 
/* Write segment details 7 

fprintf(output_Jile,"%s #%d\n",scrambled_segments[i]->p_gene->name, 

scrambled_segments[i]->number); 
fprintf(output_file,"%s\n",amino_buffer); 
fprintf(output_file,"%s\n",scrambled_segments[i]->dna_data); 
fprintf(output_file.'^n"); 
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} 



} 

/* Write synthetic protein in one long string */ 
fprintf(output__file,"Synthetic Protein:\n"); 

fprintf(output__file," -\n"); 

for (l=0;i<num_segments;l++) 

fprintf(output_file,"%s",scrambied_segrnents[i]->amino_data); 

fprintf(outputJle."\n\n"); 

/* Write synthetic dna in one long string */ 
fprintf(output_file,"Synthetic DNA:\n"); 

fprintf(output_file," \n"); 

for (i=0;i<num_segnnents;i++) 

fprintf(output_file,"%s",scrambled__segments[i]->dna_data); 

return E_NOERROR; 



/* strip__newllne() */ 

/* Replace the first newline character with a null */ 

void strip_newiine(char * strip_str) { 
char * newline_pos; 

/* Find the newline char */ 
newline_pos = strchr{strip__str,'\n'); 

/* If we found one, replace it with a null */ 
if (NULL != newline_pos) 

newline_pos[0] = *\0'; 

} 

/* pad_annino_string */ 

/* Copy annino chars from annino_ptr to padded_ptr, padding each */ 
/* side with a space. */ 

void pad_amino_string(char * amino_jDtr, char * padded _ptr) { 

while C\0' != *amino jpir) { 
*padded_ptr = ' 
padded__ptr++; 
*padded_ptr = *amino _ptr; 
padded_ptr++; 
*padded __ptr = ' 
padded_ptr++; 
amino_ptr++; 

} 



} 



/* Sticl< a null at the end of the padded string */ 
*padded_ptr = ^0'; 



/* evenO */ 

/* True if test_num is even, otherwise false */ 
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int even(int test_num) { 

^ return !(test_num % 2); 

/* read_int() */ 

/* Read an integer from stdin. Keep trying until valid int > 0 entered. */ 
/* Return the integer read, or 0 if error reading from stdin. */ 

int readjnt(char * prompt) { 

char buffer[KEYBOARD_BUFFER_SIZE]; 
Int value_read; 
int valid = FALSE; 

while (Ivalid) { 

printf("%s".prompt); 
valid = TRUE; 

fgets{buffer,KEYBOARD_BUFFER_SIZE,stdin); 
if (1 != sscanf(buffer,"%d",&value_read)) 

valid = FALSE; 
if (valid && (value_read < 1)) 

valid = FALSE; 

if (Ivalid) 

printf("Posltive integer value pleasel\n"); 



return value read; 

} 

/* read_str() 7 

/* Read a string from the user (stdin) 7 
/* Strip the newllne from It 7 

void read__str(char * prompt,char * string) { 

char buffer[KEYBOARD_BUFFER_SIZE]; 

printf(prompt); 

fgets(buffer,KEYBOARD_BUFFER_SIZE,stdin); 
sscanf(buffer,"%s",string); 

/* read_nonblankJine() 7 

/* Read a line from file until we get a non-blank one 7 

char * read_nonblankJine(char * buf.int buf_sjze,FiLE * injile) { 
char * return j)tr; 

/* Read lines until we get a non-black one. or EOF 7 
do 

return_ptr = fgets(buf,buf_size,in_file); 
while ((NULL != returnjtr) && ((An' == buf[0]) || (• • == buf[P]))); 

/* If we got a line, change the newline char to a null 7 '1 
if (NULL 1= return _ptr) 

strip_newline(buf); 
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return return_ptr; 

} 

/* user_confirmation() */ 

/* Read input from user. If user types y , return 1 , otherwise 0 */ 

int user_conflrmation() { 

char buffer[KEYBOARD_BUFFER_SIZE]; 

fgets(buffer,KEYBOARD_BUFFER_SIZE,stdln); 
if ((y == buffer[0]) || ('Y' == buffer[0])) 
return 1; 

else 

return 0; 

} 

/* testO 7 

/* For debugging/development */ 

void test() { 

char str[100]; 

prlntf("Enter something: "); 

fgets(str,100,stdin); 

prlntf("line1\n"); 

printf("%s",str); 

printf("Iine2\n"); 

fgets(str,100.stdin); 
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HepC Savlne desig-n 



HepC la consensus polyprotein sequence used for scramble program 

MSTNPKPQRKTKRNTWRRPQDVKFPGGGQIVGGVYLLPRRGPRLGWATRKTSERSQPRGR^ 
PGYPWPLYGIJEGCGWAGWIiIiSPRGSRPSWGPTDPRRRSRNLGKVIDTLTCGFADIiMGyiPIiVGAPL 
VLEDGVNyATGNLPGCSFS XFLLALLSCLTVPASAYQVRNSTGLYHVTNDCPNSS XVYEAADAILHTPGCVPCVREGM 
ASRCWAMTPTVATRDGKLPATQLRRHIDLLVGSATLCSALiYVGDLCGSVFLVGQLFTFSPRRHWTTQGC 
ITGHRMAWDMMMNWSPTAALVMAQLLRIPQAILDMIAGAHWGVLAGIAYFSMVGNWAK^ 

HAGRTTSGLVSLLTPGAKQWIQLINTNGSWHINSTALNCNESLMTGWIiAGLFYQHKFNSSGCPERLASCRRLTDFDQG 

WGPISYAWGSGPDQRPYCWHYPPKPCGIVPAKSVCGPVYCFTPSPVWGTTDRSGAPTYSWGAWDTDVFVLNNTRPPL 

GNWFGCTWMNSTGFTKVCGAPPCVIGGAG3SINTLHCPTDCFRKHPEATYSRCGSGPWITP 

TIFK^MYVGGVEHRLEAACKWTRGERCDLEDRDRSELSPIiLIiSTTQWQVLPCSFTTLPALSTGLI 

YGVGS S I AS WAI KWE YWLIiPLLIADARVCS CLWMMLLI S QAEAALENLVILNAASLAGTHGL VS FLVFFCFAWYIjKG 

RWPGAVYALYGMWPLLLLLLAIiPQRAYALDTHVAASCGGVVIjVGLM^ 

VWPPLlSTTOGGRDAVIIiLMCWHPTLVFDITKLLLAVFGPLWILQASLLKVPYFVRVQ^ 

AIIKLGALTGTYVYNHLTPLRDWAHNGLRDLAVAVEPWFSQMETKIilTWGADT^^ 

ADGMVSKGWRLLAPITAYAQQTRGLLGCIITSIiTGRDKlsrQVEGEVQIVSTAAQTFLATCINGVCWTVYHGAGTRTIAS 
PKGPVIQMYTMOT)QDLVGWPAPQGSRSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSL 

CPAGHAVGIFRAAVCTRGVAKAVDFIPVEWLETTMRSPVFTDNSSPPAVPQSPQVAHLHAPTGSGKSTKVPAAYAAQG 
YKVIJVLNPSVAATIjGFGAYMSKAHGIDPNIRTGTOTITTGSPITYSTYGKFIiADGGC 

LGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSTTGEIPFYGKAIPLEVIKGGRHLIFCHSKI^ 

AKLVALGINAVAYYRGLDVSVIPTSGDWWATDALMTGYTGDFDSVIDCNTCVTQTVDFSLDPTFTIETTTLPQDAV 

SRTQRRGRTGRGKPGIYRFVAPGERPSGMFDSSVLCECYDAGCAWYEIiTPAETOVRIjRAYMNTPGL^ 

VPTGLTHIDAHFLSQTKQSGENFPYIiVAYQATVCARAQAPPPSWDQMWKCLIRLKPTIiHGPTPLLYRLGAVQNEVTLT 

HPVTKYIMTCMSADLEWTSTWVLVGGVLAALAAYCLSTGCWIVGRIVDSGKPAIXPDREVLYREFDEMEECSQHLP 

YIEQGMMLAEQFKQKALGIiLQTASRQAEVIAPAVQTNWQKLEVFWAKHMWNFISGIQYLAGLSTLPG 

AAVTSPLTTSQTLLFNIIiGGWVAAQIiAAPGAATAFVGAGLAGAAIGSVGLGKyiiVDILAGYGAGVAGAIiVAFKIMSGE 

VPSTEDLWLIiPAILSPGALWGWCAAILRRHVGPGEGAVQWIVai^ 

LTVTQLLRRLHQWISSECTTPCSGSWLRDIWDWICEVLSDFKTWLKAKLMPQLPGIPPVSCQRGYKGWRGDGI^I^ 
CHCGAEITGHVKNGTMRIVGPRTCRNMWSGTFPINAYTTGPCTPIiPAPNYTFALm 

TDimKCPCQVPSPEFFTELDGVRIiHRFAPPCKPLIiREEVSFRVGLHEYPVGSQLPCEPEPDVAVLTSMLTDPSHITAE 
AAGRRLARGSPPSMASSSASQLSAPSLKATCTAKTHDSPDAELIEANIiLWRQEMGGNI 

EDEREISVPAEILRKSRRFAQALPWARPDYNPPLVETWKKPDYEPPVVHGCPLPPPRSPPVPPPRKICRTWLTE^ 

STALAEIiATKSFGSSSTSGITGDNTTTSSEPAPSGCPPDSDAESYSSMPPLEGEPGDPDLSDGSWSTVSSEAGTEDW 

CCSMSYSWTGALVTPCAAEEQKLPINALSNSLLRHHNLVYSTTSRSACQRQKKVTFDRLQVLDSHYQDVLKEVK^^ 

KVKANLLSVEEACSLTPPHSAKSKFGYGAKDWCHARKAVAHINSWKDIiIjEDSVTPIDTTI 

KPARLIVFPDLGVRVCEKMALYDWSKIiPLAVMGSSYGFQYSP 

IRTEEAIYQCCDIiDPQARVAIKSLTERLYVGGPLTWSRGENCGYRRCRASGVLTTSCGNTLTCYIKARA^ 

CTMLVCGDDLWICESAGVQEDAASLRAFTEAMTRYSAPPGDPPQPEYDLELITSCSSlSrV'SVAHDGAGKRW'^ 

TTPLARAAWETARHTPWSWLGNIIMFAPTLWARMILMTHFFSVLIARDQLEQALDCEIYGACYSXEPIiDIiPPII 

HGLSAFSLHSYSPGEINRVAACIiRKLGVPPLRAWRHRARSVRARLLARGGRAAICGKYLFNWAVRTKL 

IiDLSGWFTAGYSGGDIYHSVSHARPRWFWFCIiLLLAAGVGXYLLPNR 



Scramble - Output Pile 

Scramble version : 0.1 beta, 08/02/1999 
Num. genes : 1 

Num. segments ; 2 01 

Segment length : 30 
Segment overlap : 15 

Segments in original order: 



Gene : HepCla 

Segment# : 1 
Offset : 1 
1st Codon : 1 

AAMSTNPKPQRKTKRNTNRRPQDVKPPGGG 
GCCGCTATGTCCaCCS^TCCCaJUlCCCCAAAGGAAAACCaAAAGGAATACCA^ 
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Gene : HepCla 

Segments : 2 
Offset : 16 
1st Codon : 1 

NTNRRPQDVKFPGGGQIVGGVYLLPRRGPR 
AACACAAACAGAAGGCCTC7^GGATGTGAAATTCCCTGGCGGAGGCCA?iATCGTCGGCGGAGTGTATCTGCTCCCCAGAAGGGGACCCAGA 

Gene ; HepCla 

Segment # : 3 
Offset : 31 

1st Codon : 1 

QIVGGVYLIiPRRGPRLGVRATRKTSERS QP 
CAGATTGTGGGAGGCGTCTACCTCCTGCCTAGGAGAGGCCCTAGGCTCGGCGTCAGGGCTACCAGAAAGACAAGCGAAAGGTCCCAGCCT 

Gene : HepCla 

Segmenttt : 4 
Offset : 46 
1st Codon : 1 

LGVRATRKTSERSQPRGRRQPIPKARRPEG 
CTGGGAGTGAGAGCCACAAGGAAAACCTCCGAGAGAAGCCAACCCAGAGGCA6AAGGCAACCCATTCCCAAAGCCAGAAGGCCT6AGGGA 

Gene : HepCla 

Segments : 5 
Offset : 61 

1st Codon : 1 

RGRRQPI PKARRPEGRTWAQPGYPWPLYGN 
AGGGGAAGGAGACAGCCTATCCCTAAGGCTAGGAQACCCGAAGGCAGAACCTGGGCCCAACCCGGATACCCTTGGCCTCTGTATGGCAAT 

Gene ; HepCla 

Segments : 6 
Offset : 76 

1st Codon : 1 

RTWAQPGYPWPIiYGNEGCGWAGWLIiSPRGS 
AGGACATGGGCTCAGCCTGGCTATCCCTGGCCCC1?CTACGGAAACGAAGGCTGTGGCTGGGCCGGATGGCTCCTGTCCCCCAGAGGCTCC 

Gene : HepCla 

Segments : 7 
Offset : 91 
1st Codon : 1 

EGCGWAGWLIiSPRGSRPSWGPTDPRRRSRN 
GAGGGATGCGGATGGGCTGGCTGGCTGCTCAGCCCTAGGGGAAGCAGACCCTCCTGGGGACCCACAGACCCTAGGAGAAGGTCCAGGAAT 

Gene : HepCla 

Segments : 8 
Offset : 106 
1st Codon : 1 

RPSWGPTDPRRRSRNLGKVIDTLTCGFADIj 
AGGCCTAGCTGGGGCCCTACCGATCCCAGAAGGAGAAGCAGAflACCTCGGCAAAGTGATTGACACACTGACATGCGGATTCGCTGACCTC 

Gene : HepCla 

Segments : 9 
Offset : 121 
1st Codon : 1 

LiGKVIDTIiTCGPADIiMGYI PLVGAPIiGGAA 
CTGGGAAAGGTCATC6ATACCCTCACCT6TGGCTTTGCCGATCTGATGGGCTATATCCCTCTGGTCGGCGCTCCCCTCGGCGGAGCCGCT 

Gene : HepCla 

Segments : 10 
Offset : 136 
1st Codon : 1 

MGYI PliVGAPLGGAARALAHGVRVLEDGVN 
ATGGGATACATTCCCCTCGTGGGAGCCCCTCTGGGAGGCGCTGCCAGAGCCCTCGCCCATGGCGTCAGGGTCCTGGAAGACGGAGTGAAT 

Gene : HepCla 

Segments : 11 
Offset : 151 
Isfc Codon : 1 

RALAHGVRVIiEDGVNYATGNIiPGCS FS I FL 
AG6GCTCTGGCTCACGGAGTGAGAGT6CTCGAGGATGGCGTCAACTATGCCACAGGCAATCTGCCTGGCTGTAGCTTTAGCATTTTCCTC 

Gene : HepCla 

Segments : 12 
Offset : 166 
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1st Codon : 1 

YATGNliPGCSFSIFLLALLSCIiTVPASAYQ 
TACGCTACCGGAAACCTCCCCGGATGCTCCTTCTCCATCTTTCTGCTCGCCCTCCTGTCCTGCCTCACCGTCCCCGCTAGCGCTTACCAA 

Gene : HepCla 

Segment# : 13 
Offset : 181 
1st Codon : 1 

LALLS CIjTVPASAYQVRNSTGLYHVTNDCP 
CTGGCTCTGCTCAGCTGTCTGACAGT6CCTGCCTCCGCCTATCAGGTCAGGAATAGCACAGGCCTCTACCATGTGACAAACGATTGCCCT 

Gene : HepCla 

Segment# : 14 
Offset : 196 

1st Codon : 1 

VRNSTGLYHVTNDCPNSSIVYEAADAILHT 
GTGAGAAACTCCACCGGACTGTATCACGTCACCAATGACTGTCCCAATAGCTCCATCGTCTACGAAGCCGCTGACGCTATCCTCCACACA 

Gene : HepCla 

Segments : 15 
Offset : 211 
1st Codon : 1 

NSSIVYEAADAILHTPGCVPCVREGNASRC 
AACTCCAGCATTGTGTATGAGGCTGCCGATGCCATTCTGCATACCCCTGGCTGTGTGCCTTGCGTCAGGGAAGGCAATGCCTCCAGGTGT 

Gene : HepCla 

Segment # : 16 
Offset : 226 

1st Codon : 1 

PGCVP CVREGNASRCWVAMTPTVATRDGKIi 
CCCGGATGCGTCCCCTGTGTGAGAGAGGGAAACGCTAGCAGATGCTGGGTGGCTATGACACCCACAGTGGCTACCAGAGACGGAAAGCTC 

Gene : HepCla 

Segments : 17 
Offset : 241 

1st Codon : 1 

WVAMTPTVATRDGKLPATQLRRHIDLLVGS 
TGGGTCGCCATGACCCCTACCGTCGCCACAAGGGATGGCAAACTGCCTGCCACACAGCTCAGGAGACACATTGACCTCCTGGTCGGCTCC 

Gene : HepCla 

Segments : 18 
Offset : 256 

1st Codon : 1 

PATQIiRRHIDLLVGSATLCSALYVGD IiCGS 
CCCGCTACCCAACTGAGAAGGCATATCGATCTGCTCGTGGGAAGCGCTACCCTCTGCTCCGCCCTCTACGTCGGCGATCTGTGTGGCTCC 

Gene : HepCla 

Segments : 19- 
Offset : 271 

1st Codon : 1 

ATLCSALYVGDLCGSVFIiVGQIiFTFSPRRH 
GCCACACTGTGTAGCGCTCTGTATGTGGGAGACCTCTGCGGAAGCGTCTTCCTCGTGGGACAGCTCTTCACATTCTCCCCCAGAAGGCAT 

Gene : HepCla 

Segments : 20 
Offset : 286 

1st Codon : 1 

VFLVGQLFTFSPRRHWTTQ6CNCS lYPGHI 
GTGTTTCTGGTCGGCCAACTGTTTACCTTTAGCCCTAGGAGACACTGGACCACACAGGGATGCAATTGCTCCATCTATCCCGGACACAT 

Gene : HepCla 

Segments : 21 
Offset : 301 

1st Codon : 1 

WTTQGCNCSIYPGHXTGHRMAWDMMMNWSP 
TGGACAACCCAAGGCTGTAACTGTAGCATTTACCCTGGCCATATCACAGGCCATAGGATGGCCTGGGACATGATGATGAACTGGAGCCCT 

Gene : HepCla 

Segments : 22 
Offset : 316 

1st Codon : 1 

TGHRMAWDMMMNWS PTAALVMAQLLRI PQA 
ACCGGACACAGAATGGCTTGGGATATGATGATGAATTGGTCCCCCACAGCCGCTCTGGTCATGGCTCAGCTCCTGAGAATCCCTCAGGCT 
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Gene : HepCla 

Segment# : 23 
Offset : 331 - 
1st Codon : 1 

TAAIiVMAQLLRIPQAI IiDMIAGAHWGVLAG 
ACCGCTGCCCTCGTGATGGCCCAACTGCTCAGGATTCCCCAAGCCATTCTGGATATGATTGCCGGAGCCCATTGGGGAGTGCTCGCC6GA 

Gene : HepCla 

Segment # : 24 
Offset : 346 

1st Codon : 1 

ILDMIAGAHWGVIjAGIAYFSMVGNWAKVLV 
ATCCTCGACATGATCGCTGGCGCTCACTGGGGCGTCCTGGCTGGCATTGCCTATTTCTCCATGGTCGGCAATTGGGCTAAGGTCCTGGTC 

Gene : HepCla 

Segments : 25 
Offset : S61 
1st Codon : 1 

lAYFSMVGNWAKVLVVLLLFAGVDAETHVT 
ATC6CTTACTTTAGCATGGTGGGAAACTGGGCCAAAGTGCTCGTGGTCCTGCTCCTGTTTGCCGGAGTGGATGCCGAAACCCATGTGACA 

Gene : HepCla 

Segment # : 26 
Offset : 376 

1st Codon : 1 

VLIiLFAGVDAETHVTGGWAGRTTSGIiVSIiL 
GTGCTCCTGCTCTTCGCTGGCGTCGACGCTGAGACACACGTCACCGGAGGCAATGCCGGAAGGACAACCTCCGGCCTCGTGTCCCTGCTC 

Gene : HepCla 

Segment^ : 27 
Offset : 391 

1st Codon : 1 

GGNAGRTTSGLVSIiIiTPGAKQNIQIjINTNG 
GGCGGAAACGCTGGCAGAACCACAAGCGGACTGGTCAGCCTCCTGACACCCGGAGCa^AACAGAATATCCAACTGATTAACACAAACGG 

Gene : HepCla 

Segment# : 28 
Offset : 406 
1st Codon : 1 

TPGAKQNIQIiINTNGSWHINSTALNCNESL 
ACCCCTGGCGCTAAGCAAAACATTCAGCTCATCAATACCAATGGCTCCTGGCATATCAATAGCACAGCCCTCAACTGTAACGAAAGCCTC 

Gene ; HepCla 

Segment # : 29 
Offset : 421 

1st Codon ; 1 

SWHINSTALNCNESLNTGWLAGLFYQHKFN 
AGCTGGCACATTAACTCCACCGCTCTGAATTGCAATGAGTCCCTGAATACCGGATGGCTCGCCGGACTGTTTTACCAACACAAATTCAAT 

Gene : HepCla 

Segments : 3 0 
Offset : 436 

1st Codon : 1 

NTGWIiAGLFYQHKFNSSGCPERLASCRRLT 
AACACAGGCTGGCTGGCTGGCCTCTTCTATCAGCATAAGTTTAACTCCAGCGGATGCCCTGAGAGACTGGCTAGCTGTAGGAGACTGACA 

Gene : HepCla 

Segments : 31 
Offset : 451 

1st Codon : 1 

SSGCPERLASCRRLTDFDQGWGPISYANGS 
AGCTCCGGCTGTCCCGAAAGGCTCGCCTCCTGCA6AAGGCTCACCGATTTCGATCAGGGATGGGGACCCATTAGCTATGCCAATGGCTCC 

Gene : HepCla 

Segments : 32 
Offset : 466 

1st Codon : 1 

DFDQGWGPISYANGSGPDQRPYCWHYPPKP 
GACTTTGACCAAGGCTGGGGCCCTATCTCCTACGCTAACGGAAGCGGACCCGATCAGAGACCCTATTGCTGGCACTATCCCCCTAAGCCT 

Gene : Hepcia 

Segments : 33 
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Offset : 481 
1st Codon : 1 

GPDQRPYCWHYPPKPCGIVPAKSVCGPVYC 
GGCCCTGACCyiAAGGCCTTACTGTTGGCATTACCCTCCCAAACCCTGTGGCATTGTGCCTGCaiAAAGCG^ 



Gene 

Segment # 
Offset 
1st Codon 
CGI 



HepCla 

34 

496 

1 

P A K 



VPAKSVCGPVYCFTPSPVVVGTTDRSG 
TGCGGAATCGTCCCCGCTAAGTCCGTGTGTGGCCCTGTGTATTGCTTTACCCCTAGCCCTGTGGTCGTGGGAACCACAGACA6AAGCGGA 



Gene 

Segment# 
Offset 
1st Codon 



HepCla 
35 
511 
1 



FTPSPVVVGTTDRSGAPTYSWGANDTDVFV 
TTCACACCCTCCCCCGTCGTGGTCGGCACAACCGATAGGTCCGGCGCTCCCACATACTCCTGGGGAGCCAATGACACAGACGTCTTCGTC 

Gene : HepCla 

Segment# -.36 
Offset : 526 
1st Codon : 1 
APTYSWGAN 



Gene : HepCla 

Segment# ; 37 
Offset : 541 

1st Codon : 1 

LNNTRPPLGNWFGCTWMNSTGFTKVCGAPP 
CTGAATAACACAAGGCCTCCCCTCGGCAATTGGTTTGGCTGTACCTGGATGAATAGCACAGGCTTTACCAAAGTGTGTGGCGCTCCCCCT 



Gene 

Segments 
Offset 
1st Codon 



HepCla 
38 
556 
1 



WMNSTGFTKVCGAPPCVIGGAGNNTLHCPT 
TGGATGAACTCCACCGGATTCACAAAGGTCTGCGGAGCCCCTCCCTGTGTGATTGGCGGAGCCGGAAACAATACCCTCCACTGTCCCACA 



Gene 

Segments 
Offset 
1st Codon 
C V I 



HepCla 
39 

571 
1 

G G A G 



N N 



C F 



K H 



Y S R 



TGCGTCATCGGAGGCGCTGGCAATAACACACTGCATTGCCCTACCGATTGCTTTAGGAAACACCCTGAGGCTACCTATAGCAGATGCGGA 

Gene : HepCla 

Segments : 4 0 
Offset : 586 

1st Codon : 1 

DCPRKHPEATYSRCGSGPWITPRCLVDYPY 
GACTGTTTCAGAAAGCATCCCGAAGCCACATACTCCAGGTGTGGCTCCGGCCCTTGGATTACCCCTAGGTGTCTGGTCGACTATCCCTAT 

Gene : HepCla 

Segments : 41 
Offset : 601 
1st Codon : 1 

SGPWITPRCLVDYPYRIiWHYPCTINYTI FK 
AGCG6ACCCTGGATCACACCCA6ATGCCTCGTGGATTACCCTTACAGACTGTGGCACTATCCCTGTACCATTAACTATACCATTTTCAAA 



Gene 

Segments 
Offset 
1st Codon 



HepCla 
42 
616 
1 



RLWHYPCTINYTIFKVRMYVGGVEHRLEAA 
A6GCTCTGGCATTACCCTTGCACAATCAATTACACAATCTTTAAGGTCAGGATGTACGTCGGCGGAGTGGAACACAGACTGGAAGCCGCT 

Gene : HepCla 

Segments : 43 
Offset : 631 

1st Codon : 1 
V R M Y V G G 



V E H R L 



C N W 



RGB 



R D 
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GTGAGAATGTATGTGGGAGGCGTCGAGCATAGGCTCGAGGCTGCCTGTAACTGGACCAGAGGCGAAAGGTGTGACCTCGAGGATAGGGAT 

Gene : HepCla 

Segment # : 44 
Offset : 646 

1st Codon : 1 

CNWTRGERCDLEDRDRSELSPLLLS TTQWQ 
TGCAATTGGACAAGGGGAGAGAGATGCGATCTGGAAGACT^GAGACAGAAGCGAACTGTCCCCCCTCCTGCTCAGa^CAACCCAATGGC^ 

Gene ; HepCla 

Segment # : 45 
Offset : 661 
1st Codon : 1 

RSELSPLLLSTTQWQVLPCSFTTLPALSTG 
AGGTCCGAGCTCAGCCCTCTGCTCCTGTCCACCACACAGTGGCAGGTCCTGCCTTGCTCCTTCACAACCCTCCCCGCTCTGTCCACCGGA 

Gene : HepCla 

Segment* : 46 
Offset : 676 

1st Codon : 1 

VLPCSFTTLPALSTGLIHLHQNIVDVQYLY 
GTGCTCCCCTGTAGCTTTACCACACTGCCTGCCCTCAGCACAGGCCTCATCCATCTGCATCAGAATATCGTCGACGTCCAGTATCTGTAT 

Gene : HepCla 

Segment # : 47 
Offset : 691 

1st Codon : 1 

lilHLHQNJVDVQYLYGVGSSIASWAIKWEY 
CTGATTCACCTCCACCAAAACATTGTGGATGTGCAATACCTCTAC6GAGTGGGAAGCTCCATCGCTAGCTGGGCCATTAAGTGGGAGTAT 

\ 

Gene : HepCla 

Segment # : 48 
Offset : 70 6 
1st Codon : 1 

GVGSS lASWAIKWEYVVLIiFLLLADARVCS 
GGCGTCGGCTCCAGCATTGCCTCCTGGGCTATCAAATGGGAATACGTCGTGCTCCTGTTTCTGCTCCTGGCTGACGCTAGGGTCTGCTCC 

Gene : HepCla 

Segment # : 49 
Offset : 721 

1st Codon : 1 

VVLLPIiLLADARVCSCLWMMLLISQAEAAL 
GTGGTCCTGCTCTTCCTCCTGCTCGCCGATGCCAGAGTGTGTAGCTGTCTGTGGATGATGCTGCTCATCTCCCAGGCTGAGGCTGCCCTC 

Gene : HepCla 

Segments : 50 
Offset : 736 

1st Codon : 1 

CriWMMLLISQAEAALENLVILNAASLAGTH 
TGCCTCTGGATGATGCTCCTGATTAGCCAAGCCGAAGCCGCTCTGGAAAACCTCGTGATTCTGAATGCCGCTAGCCTCGCCGGAACCCAT 

Gene : HepCla 

Segments : 51 
Offset : 751 
1st Codon : 1 

ENLVIIiNAASLAGTHGLVSFIiVFFCPAWYL 
GAGAATCTGGTCATCCTCAACGCTGCCTCCCTGGCTGGCACACACGGACTGGTCAGCTTTCTGGTCTTCTTTTGCTTTGCCTGGTACCTC 

Gene : HepCla 

Segments : 52 
Offset : 766 

1st Codon : 1 

G LVSFIiVFFCFAWYIiKGRWVPGAVYAIiYGM 
GGCCTCGTGTCCTTCCTCGTGTTTTTCTGTTTCGCTTGGTATCTGAAAGGCAGATGGGTCCCCGGAGCCGTCTACGCTCTGTATGGCATG 

Gene : HepCla 

Segments : 53 
Offset : 781 ' 

1st Codon : 1 

KGRWVPGAVYALYGMWPLLLLLIiAIiPQRAY 
AAGGGAAGGTGGGTGCCTGGCGCTGTGTATGCCCTCTACGGAATGTGGCCCCTCCTGCTCCTGCTCCTGGCTCTGCCTCAGAGAGCCTAT 

Gene t HepCla 
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Segmeiit# : 54 
Offset ; 736 
1st Codon : 1 

WPLLLLLLAIiPQRAYALDTEVAAS CGGVVL 
TGGCCTCTGCTCCTGCTCCTGCTCGCCCTCCCCCAAAGGGCTTACGCTCTGGATACCGAAGTGGCTGCCTCCTGCGGAGGCGTCGTGCTC 

Gene : HepCla 

Segment # ; 55 
Offset : 811 

1st Codon : 1 

ALDTEVAASCGGVVLVGIiMALTLS PYYKRY 
GCCCTCGACACAGAGGTC6CCGCTAGCTGTGGCGGAGTGGTCCTGGTCGGCCTCATGGCTCTGACACTGTCCCCCTATTACAAAAGGTAT 

Gene : HepCla 

Segment # : 56 
Offset : 826 

1st Codon : 1 

VGLMAIiTLSPYYKRYISWCLWWLQYPLTRV 
6TGGGACTGATGGCCCTCACCCTCAGCCCTTACTATAAGAGATACATTAGCTGGTGCCTCTGGTGGCTGCAATACTTTCTGACAAGGGTC 

Gene : HepCla 

Segment # : 57 
Offset : 841 
1st Codon : 1 

ISWCLWWLQYFLTRVEAQLHVWVPPLNVRG 
ATCTCCTGGTGTCTGTGGTGGCTCCAGTATTTCCTCACCAGAGTGGAAGCCCAACTGCATGTGTGGGTGCCTCCCCTCAACGTCAGGGGA 

Gene : HepCla 

Segment^ : 5 8 
Offset : 856 

1st Codon : 1 

EAQLHVWVPPLNVRGGRDAVILLMCVVHPT 
GAGGCTCAGCTCCACGTCTGGGTCCCCCCTCTGAATGTGAGAGGCG6AAGGGATGCCGTCATCCTCCTGATGTGCGTCGTGCATCCCACA 

Gene : HepCla 

Segment* : 59 
Offset : 871 
1st Codon : 1 

GRDAVILZiMCVVHPTLVPDITKLLLAVFGP' 
GGCA6AGACGCTGTGATTCTGCTCATGTGTGTGGTCCACCCTACCCTCGTGTTTGACATTACCAAACTGCTCCTGGCTGTGTTTGGCCCT 

Gene : HepCla 

Segment # : 60 
Offset : 886 

1st Codon : 1 

liVFDITKLLIjAVFGPLWILQASIiLKVPYFV 
CTGGTCTTCGATATCACAAAGCTCCTGCTCGCCGTCTTCGGACCCCTCTGGATTCTGCAAGCCTCCCTGCTCAAGGTCCCCTATTTCGTC 

Gene : HepCla 

Segment# : 61 
Offset : 901 

1st Codon : 1 

IiWILQASIiLKVPYFVRVQGLLRICAliARKM 
CTGTGGATCCTCCAGGCTAGCCTCCTGAAAGTGCCTTACTTTGT6AGAGTGCAAGGCCTCCTGA6AATCTGTGCCCTCGCCA6AAAGATG 

Gene : HepCla 

Segment # : 62 
Offset : 916 
1st Codon : 1 

RVQGLLRICALARKM IGGHYVQMAIIKLGA 
AG6GTCCAGGGACTGCTCA6GATTTGCGCTCTGGCTAGGAAAAT6ATTGGCGGACACTATGTGCAAATGGCTATCATTAAGCTCGGCGCT 

Gene : HepCla 

Segment# : 63 
Offset : 931 

1st Codon : 1 

I GGHYVQMAIIKLGALTGTYVYNHLTPLRD 
ATCGGAGGCCATTACGTCCAGATGGCCATTATCAAACTGGGAGCCCTCACCGGAACCTATGTGTATAACCATCTGACACCCCTCAGGGAT 

Gene : HepCla 

Segments : 64 

Offset : 946 

1st Codon : 1 
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LTGTYVYNHLTPLRDWAHNGLRDLAVAVEP 
CTGACa^GGCACATACGTCTACAATCACCTCACCCCTCTGAGAGAGTGGGCCCATAACGGACTGAGAGAGCTCGCCGTCGCCGTCGAGCCT 

Gene : HepCla 

Segment # : 65 
Offset : 961 

1st Codon ; 1 

WAHNGLRDLAVAVEPVVFSQMETKLITWGA 
TGGGCTCACAATGGCCTCAGGGATCTGGCTGTGGCTGTGGAACCCGTCGTGTTTAGCCAAATGGAAACCAAACTGATTACCTGGGGCGCT 

Gene ; HepCla 

Segment# : 66 
Offset : 976 

1st Codon : 1 

VVPSQMETKIilTWGADTAACGDIINGIiPVS 
GTGGTCTTCTCCCAGATGGAGACAAAGCTCATCACATGGGGAGCCGATACCGCTGCCTGTGGCGATATCATTAACGGACTGCCTGTGTCC 

Gene : HepCla 

Segment # : 67 
Offset : 991 

1st Codon : 1 

DTAACGDIINGLPVSARRGREILLGPADGM 
GACACAGCCGCTTGCGGAGACATTATCAATGGCCTCCCCGTCAGCGCTAGGAGAGGCAGAGAGATTCTGCTCGGCCCTGCCGATGGCATG 

Gene : HepCla 

Segment # : 68 
Offset : 1006 
1st Codon : 1 

ARRGREILLGPADGMVSKGWRLLAPITAYA 
GCCAGAAGGGGAAGGGAAATCCTCCTGGGACCCGCTGACGGAATGGTCAGCAAAGGCTGGAGGCTCCTGGCTCCCATTAGCGCTTACGCT 

Gene : HepCla 

Segment # : 69 
Offset : 1021 

1st Codon : 1 

VSKGWRLLAPITAYAQQTRGLLGCIITSLT 
GTGTCCAAGGGATGGAGACTGCTCGCCCCTATCACAGCCTATGCCCAACAGACAAGGGGACTGCTCGGCTGTATCATTACCTCCCTGACA 

Gene : HepCla 

Segments : 70 
Offset : 1036 

1st Codon : 1 

QQTRGLLGCIITSLTGRDKNQVEGEVQIVS 
CAGCAAACCAGAGGCCTCCTGGGATGCATTATCACAAGCCTCACCGGAAGGGATAAGAATCAGGTCGAGGGAGAGGTCCAGATTGTGTCC 

Gene : HepCla 

Segments : 71 
Offset : 1051 
1st Codon : 1 

GRDKNQVEGEVQIVSTAAQTFLATCINGVC 
GGCAGAGACAAAAACCAAGTGGAAGGCGAAGTGCAAATCGTCAGCACAGCCGCTCAGACATTCCTCGCCACATGCATTAACGGAGTGTGT 

Gene : HepCla 

Segments : 72 
Offset : 1066 

1st Codon : 1 

TAAQTPLATCINGVCWTVYHGAGTRTIASP 
ACCGCTGCCCAAACCTTTCTGGCTACCTGTATCAATGGCGTCTGCTGGACCGTCTACCATGGCGCTGGCACAAGGACAATCGCTAGCCCT 

Gene : HepCla 

Segments : 73 
Offset : 1081 

1st Codon ': 1 

WTVYHGAGTRTIASPKGPVIQMYTWVDQDL 
TGGACAGTGTATCACGGAGCCGGAACCAGAACCATTGCCTCCCCCAAAGGCCCTGTGATTCAGATGTACACAAACGTCGACCAAGACCTC 

Gene : HepCla 

Segments : 74 
Offset : 1096 

1st Codon : 1 

KGPVIQMYTNVDQDLVGWPAPQGSRSLTPC 
AAGGGACCCGTCATCCAAATGTATACCAATGTGGATCAGGATCTGGTCGGCTGGCCCGCTCCCCAAGGCTCCAGGTCCCTGACACCCTGT 
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Gene : HepCla 

Segtnent# ; 75 

Offset : 1111 
1st Codon : 1 

VGWPAPQGSRSLTPCTCGSSDLYLVTRHAD 
GTGGGATGGCCTGCCCCTCAGGGAAGCAGAAGCCTCACCCCTT6CACa.TGCGGAAGCTCCGACCTCTACCTCGTGAC^GGCATGCCGAT 

Gene : HepCla 

Segment # ; 76 
Offset : 1126 
1st Codon ; 1 

TCGSSDLYLVTRHADVIPVRRRGDSRGSLL 
ACCTGTGGCTCCAGCGATCTGTATCTGGTCACCAGACACGCTGACGTCATCCCTGTGAGAAGGAGAGGCGATAGCAGAGGCTCCCTGCTC 

Gene : HepCla 

Segment # : 77 
Offset : 1141 
1st Codon : 1 

VIPVRRRGDSRGSIiLSPRPISYLKGSSGGP 
GTGATTCCCGTCAGGAGAAGGGGAGACTCCAGGGGAAGCCTCCTGTCCCCCAGACCCATTAGCTATCTGAAAGGCTCCAGCGGAGGCCCT 

Gene : HepCla 

Segment # : 78 
Offset ; 1156 
1st Codon : 1 

SPRPISYIiKGSSGGPLIiCPAGHAVGIPRAA 
AGCCCTAGGCCTATCTCCTACCTCAAGG6AAGCTCCGGCGGACCCCTCCTGTGTCCCGCTGGCCATGCCGTCGGCATTTTCAGAGCCGCT 

Gene : HepCla 

Segment# : 79 
Offset : 1171 

1st Codon : 1 

LLCPAGHAVGIFRAAVCTRGVAKAVDFI P V 
CTGCTCTGCCCTGCCGGACACGCTGTGGGAATCTTTAGGGCTGCCGTCTGCACAAGGGGAGTGGCTAAGGCTGTGGATTTCATTCCCGTC 

Gene : HepCla 

Segnient# : 80 
Offset : 1186 
1st Codon : 1 

VCTRGVAKAVDFI PVENIiETTMRS PVFTDN 
GTGTGTACCAGAGGCGTCGCCAAAGCCGTC6ACTTTATCCCTGTGGAAAACCTCGAGACAACCATGAGGTCCCCCGTCTTCACAGACAAT 

Gene : HepCla 

Segment # : 81 
Offset : 1201 

1st Codon : 1 

ENLETTMRS PVFTDNSSPPAVPQSFQVAHIj 
GAGAATCTGGAAACCACAATGAGAA6CCCTGTGTTTACC6ATAACTCCAGCCCTCCCGCTGTGCCTCAGTCCTTCCAAGTGGCTCACCTC 

Gene : HepCla 

Segment # i 82 
Offset : 1216 
1st Codon : 1 

SSPPAVPQSFQVAHLHAPTGSGKSTKVPAA 
A6CTCCCCCCCTGCCGTCCCCCAAAGCTTTCAGGTCGCCCATCTGCATGCCCCTACCGGAAGCGGAAAGTCCACCAAAGTGCCTGCCGCT 

Gene : HepCla 

Segment^ : 83 
Offset : 1231 
1st Codon : 1 

HAPTGSGKS TKVPAAYAAQGYKVLVLNPSV 
CACGCTCCCACAGGCTCCGGCAAAAGCACAAAGGTCCCCGCTGCCTATGCCGCTCAGGGATACAAAGTGCTCGTGCTCAACCCTAGCGTC 

Gene : HepCla 

Segments : 84 
Offset : 1246 

1st Codon : 1 

YAAQGYKVLVLNPSVAATLGFGAYMSKAHG 
TACGCTGCCCAAGGCTATAAGGTCCTGGTCCTGAATCCCTCCGTGGCTGCCACACTGGGATTCGGAGCCTATATGTCCTiAGGCTCACGGA 

Gene : HepCla 

Segments : 85 
Offset : 1261 
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1st Codon : 1 

AATLGFGAYMSKAHGIDPNIRTGVRTITTG 
GCCGCTACCCTCGGCTTTGGCGCTTACATGAGCAAAGCCCaiTGGCATTGACCCTAACATTAGGACAGGCGTm 

Gene : HepCla 

Segment # : 86 
Offset : 1276 
1st Codon : 1 

IDPNIRTGVRTITTGSPITYSTYGKFIjADG 
ATCGATCCa^TATCAGAACCGGAGTGAGAACCATTACCACAGGCTCCCCCATTACCTATAGCACATACGGAAAGTTTCTGGCTGACGGA 

Gene : HepCla 

Segment# : 87 
Offset : 1291 
1st Codon : 1 

SPITYSTYGKFIiADGGCSGGAYDIIICDEC 
AGCCCTATCACATACTCCACCTATGGCAAATTCCTCGCCGATGGCGGATGCTCCGGCG6AGCCTATGACATTATCATTTGCGATGAGTGT 

Gene : HepCla 

Segment# : 88 
Offset : 1306 
1st Codon : 1 

GCSGGAYDIIICDECHSTDATSILGIGTVI, 
GGCTGTAGCGGAGGCGCTTACGATATCATTATCTGTGACGAATGCCATAGCACAGACGCTACCTCCATCCTCGGCATTGGCACAGTGCTC 

Gene : HepCla 

Segments : 89 
Offset : 1321 

1st Codon : 1 

HSTDATS I IiGIGTVIiDQAETAGARLVVLAT 
CACTCCACCGATGCCACAAGCATTCTGGGAATCGGAACCGTCCTGGATCAGGCTGAGACAGCCGGAGCCAGACTGGTCGTC^CTCGCCACA 

Gene : HepCla 

Segment # : 90 
Offset : 1336 
1st Codon : 1 

DQAETAGARIiVVLATATPPGSVTVPHPNI E 
GACCAA6CCGAAACCGCTGGCGCTAGGCTCGTGGTCCTGGCTACCGCTACCCCTCCCGGAAGCGTCACCGTCCCCCATCCCAATATCGAA 

Gene : HepCla 

Segment # : 91 
Offset : 1351 
1st Codon : 1 

ATPPGSVTVPHPNIEEVAIiSTTGEIPFYGK 
GCCACACCCCCTGGCTCCGTGACAGTGCCTCACCCTAACATTGA6GAAGTGGCTCTGTCCACCACAGGCGAAATCCCTTTCTATGGCAAA 

Gene : HepCla 

Segment# : 92 
Offset : 13 66 

1st Codon : 1 

EVAIiSTTGEIPFYGKAIPLEVIKGGRHLI F 
GAGGTCGCCCTCAGCACAACCGGAGAGATTCCCTTTTACGGAAAGGCTATCCCTCTGGAAGTGATTAAGGGAGGCAGACACCTCATCTTT 

Gene : HepCla 

Segments : 93 
Offset : 1381 
1st Codon : 1 

AIPLEVIKGGRHLIFCHSKKKCDELAAKLV 
GCCATTCCCCTCGAGGTCATCAAAGGCGGAAGGCATCTGATTTTCTGTCACTCCAAGAAAAAGTGTGACGAACTGGCTGCCAAACTGGTC 

Gene : HepCla 

Segments : 94 
Offset : 1396 

1st Codon : 1 

CHSKKKCDELAAKLVALGINAVAYYRGIiDV 
TGCCATAGCAAAAAGAAATGCGATGAGCTCGCCGCTAAGCTCGTGGCTCTGGGAATCAATGCCGTCGCCTATTACAGAGGCCTCGACGTC 

Gene : HepCla 

Segments : 95 
Offset : 1411 

1st Codon : 1 

ALGINAVAYYRGLDVSVI PTS6DVVVVATD 
GCCCTCGGCATTAACGCTGTGGCTTACTATAG6GGACTGGATGTGTCCGTGATTCCCACAAGCGGAGACGTCGTGGTCGTGGCTACCGAT 
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Gene : HepCla 

Segments : 96 
Offset : 1426 
1st Codon : 1 

SVIPTSGDVVVVATDALMTGYTGDFDSVID 
AGCGTCATCCCTACCTCCGGCGATGTGGTCGTGGTCGCCACAGACGCTCTGATGACCGGATACACAGGCGATTTCGATAGCGTCAT,CGAT 

Gene : HepCla 

Segment# : 97 
Offset : 1441 

1st Codon : 1 

ALMTGYTGDFDSVIDCNTCVTQTVDFSLDP 
GCCCTCATGACAGGCTATACCGGAGACTTTGACTCCGTGATTGACTGTAACACATGCGTCACCCAAACCGTCGACTTTAGCCTCGACCCT 

Gene : HepCla 

Segments : 98 
Offset : 1456 

1st Codon : 1 

CNTC'VTQTVDFSLDPTFT I ETTTL PQDAVS 
TGCAATACCTGTGTGACACAGACAGTGGATTTCTCCCTGGATCCCACATTCACAATCGAAACCACAACCCTCCCCCAAGACGCTGTGTCC 

Gene : HepCla 

Segments : 99 
Offset : 1471 

1st Codon : 1 

TFTI ETTTLPQDAVSRTQRRGRTGRGKPGI 
ACCTTTACCATTGAGACAACCACACTGCCTCAGGATGCCGTCAGCAGAACCCAAAGGAGAGGCAGAACCGGAAGGGGAAAGCCTGGCATT 

Gene : HepCla 

Segments : 10 0 
Offset : 1486 
1st Codon : 1 

RTQRRGRTGRGKPGIYRFVAPGERPSGMFD 
AGGACACAGAGAAGGGGAAGGACAGGCAGAGGCAAACCCGGAATCTATAGGTTTGTGGCTCCCGGAGAGAGACCCTCCGGCATGTTCGAT 

Gene : HepCla 

Segments : 101 
Offset : 1501 
1st Codon : 1 

YRFVAPGERPSGMFDSSVIiCECYDAGCAWY 
TACAGATTCGTCGCCCCTGGCGAAAGGCCTAGCGGAATGTTTGACTCCAGCGTCCTGTGTGAGTGTTACGATGCCGGATGCGCTTGGTAT 

Gene : HepCla 

Segments : 102 
Offset : 1516 
1st Codon : 1 

SSVIiCE CYDAGCAWYELiTPAETTVRLRAYM 
AGCTCCGTGCTCTGCGAATGCTATGACGCTGGCTGTGCCTGGTACGAACT6ACACCCGCTGAGACAACCGTCAGGCTCAGGGCTTACATG 

Gene : HepCla 

Segments : 103 
Offset : 1531 
1st Codon : 1 

ELTPAETTVRLRAYMNTPGIiPVCQDHIjEFW 
GAGCTCACCCCTGCCGAEACCACAGTGAGACTGAGAGCCTATATGAATACCCCTGGCCTCCCCGTCTGCCAAGACCATCTGGAATTCTGG 

Gene : HepCla 

Segments : 104 
Offset : 1546 
1st Codon : 1 

NTPGIiPVCQDHLEFWEGVFTGLTHIDAHFIi 
AACACACCCGGACTGCCTGTGTGTCAGGATCACCTCGAGTTTTGGGAAGGCGTCTTCACAGGCCTCACCCATATCGATGCCCATTTCCTC 

Gene : HepCla 

Segments : 105 
Offset : 1561 
1st Codon : 1 

EGVFTGLTHIDAHFLSQTKQSGENFPYLVA 
GAGGGAGTGTTTACCGGACTGACACACATTGACGCTCACTTTCTGTCCCAGACAAAGCAAAGCGGAGAGAATTTCCCTTACCTCGTGGCT 

Gene : HepCla 

Segments : 106 
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Offset : ISVe 
1st Codon : 1 

SQTK_Q S GENFPYLVAYQATVCARAQAPP P S 
AGCCaiAACCAAACAGTCCGGCGAAAACTTTCCCTATCTGGTCGCCTATC:a.GGCTACCGTCTGCGCTAGGGCTC:AGGCTC 

Gene : HepCla 

Segment # : 107 
Offset : 1591 

1st Codon : 1 

YQATV CARAQAPPPSWDQMWKCLIRLKPTL 
TACCAAGCCACAGTGTGTGCCAGAGCCCAAGCCCCTCCCCCTAGCTGGGACCAAATGTGGAAGTGTCTGATTAGGCTCAAGCCTACCCTC 

Gene : HepCla 

Segments : 10 8 
Offset : 1606 
1st Codon : 1 

WDQMWKCLIRLKPTIiHGPTPLLYRLGAVQN 
TGGGATCAGATGTGGAAATGCCTCATCAGACTGAAACCCACACTGCATGGCCCTACCCCTCTGCTCTACAGACTGGGAGCCGTCCAGAAT 

Gene : HepCla 

Segment# : 109 
Offset : 162X 
1st Codon : 1 

HGPTP LLYRLGAVQNEVTLTHPVTKYIMT C 
CACGGACCCACACCCCTCCTGTATAGGCTCGGCGCTGTGCAAAACGAAGTGACACTGACACACCCT6TGACAAA6TATATCATGACCTGT 

Gene : HepCla 

Segment # : 110 
Offset : 1636 

1st Codon : 1 

EVTLTHPVTKYIMTCMSADLEVVTSTWVLV 
GAGGTCACCCTCACCCATCCC6TCACCAAATACATTATGACATGCATGAGCGCTGACCTCGAGGTCGTGACAAGCACATGGGTCCTGGTC 

Gene : HepCla 

Segments : 111 
Offset : 1651 
1st Codon : 1 

MSADL EVVTS TWVIiVGGVLAAIiAAYCL S T G 
ATGTCCGCCGATCTGGAAGTGGTCACCTCCACCTGGGTGCTCGTGGGAGGCGTCCTGGCTGCCCTCGCCGCTTACTGTCTGTCCACCGGA 

Gene : HepCla 

Segments : 112 

Offset : 1666 

1st Codon : 1 

GGVLAAUAAYCIiSTGCVVIVGRIVLSGKPA 
GGCGGAGTGCTCGCCGCTCTGGCTGCCTATTGCCTCAGCACAGGCTGTGTGGTCATCGTCGGCAGAATCGTCCTGTCCGGCAAACCCGCT 

Gene : HepCla 

Segments : 113 
Offset : 1681 
1st Codon : 1 

CVVIVGRIVLSGKPAII PDREVLYREFDEM 
TGCGTCGTGATTGTGGGAAGGATTGTGCTCAGCGGAAAGCCTGCCATTATCCCTGACAGAGAGGTCCTGTATAGGGAATTCGATGAGATG 

Gene : HepCla 

Segments : 114 
Offset : 1696 

1st Codon : 1 

IIPDREVLiYREFDEMEECSQHLPYIEQGMM 
ATCATTCCCGATAGGGAAGTGCTCTACAGAGAGTTTGACGAAATGGAAGAGTGTAGCCAACACCTCCCCTATATCGAACAGGGAATGATG 

Gene : HepCla 

Segments • 115 
Offset : 1711 

1st Codon : 1 

EECSQHIiPYIEQGMMLAEQFKQKALGIiLQT 
GAGGAATGCTCCCAGCATCTGCCTTACATTGAGCAAGGCATGATGCTCGCCGAACAGTTTAAGCAAAAGGCTCTGGGACTGCTCCAGACA 

Gene : HepCla 

Segments : 116 
Offset : 1726 

1st Codon J 1 

LAEQFKQKALGLIiQTASRQAEVIAPAVQTN 
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CTGGCTGAGCAATTCAAACAGAAAGCCCTCGGCCTCCTGCAAACCGCTAGCAGACAGGCTGAGGTCATCGCTCCCGCTGTGCAAACCAAT 

Gene : HepCla 

Segment # : 117 
Offset : 1741 
1st Codon : 1 

ASRQAEVIAPAVQTNWQKLEVFWAKHMWNF 
GCCTCCAGGCAAGGCGAAGTGATTGCCCCTGCCGTCCAGACAAACTGGCAGAAACTGGTIAGTGTTTTGGGCTAAGCATATGTGGAACTTT 

Gene : HepCla 

Segment # : 118 
Offset : 1756 
1st Codon : 1 

WQKLEVFWAKHMWNFI SGIQYLAGIjSTL PG 
TGGCAAAAGCTCGAGGTCTTCTGGGCCATiACACATGTGGAATTTCATTAGCGGAATCCAATACCTCGCCGGACTGTCCACCCTCCCCGGA 

Gene : HepCla 

Segment* : 119 
Offset : 1771 
1st Codon : 1 

I SGI OYLAGIiSTLPGNPAIASLMAFTAAVT 
ATCTCCGGCATTCAGTATCTGGCTGGCCTCAGCACACTGCCTGGCAATCCCGCTATCGCTAGCCTCATGGCTTTCACAGCCGCTGTGACA 

Gene : HepCla 

Segments : 12 0 
Offset : 1786 

1st Codon : 1 

NPAIASLMA'^FTAAVTS PIiTTSQTLLFNI LG 
AACCCT6CCATTGCCTGCCTGATGGCCTTTACCGCTGCCGTCACCTCCCCCCTCACCACAAGCCAAAGCCTCQTGTTTAACATTCTGGGA 

Gene : HepCla 

Segment # : 121 
Offset : 1801 
1st Codon : 1 

SPLTTSQTLLFNIIiGGWVAAQLAAPG^AATA 
AGCCCTCTGACAACCTCCCAGACACTGCTCTTCAATATCCTCGGCGGATGGGTCGCCGCTCAGCTCGCCGCTCCCGGAGCCGCTACCGCT 

Gene : HepCla 

Segments : 122 
Offset : 1816 
1st Codon : 1 

GWVAAQLiAAPGAATAFVGAGIiAGAAIGSVG 
GGCTGGGTGGCTGCCCAACTGGCTGCCCCTGGCGCTGCCACAGCCTTTGTGGGAGCCGGACTGGCTGGCGCTGCCATTGGCTCCGTGGGA 

Gene : HepCla 

Segments : 123 
Offset : 1831 
1st Codon : 1 

FVGAGIiAGAAIGSVGIiGKVLVDILAGYGAG 
TTCGTCGGCGCTGGCCTCGCCGGAGCCGCTATCGGAAGCGTCGGCCTCGGCAAAGTGCTCGTGGATATCCTCGCCGGATACGGAGQCGGA 

Gene : HepCla 

Segments : 124 
Offset : 1846 

1st Codon : 1 

LGKVXiVDILAGYGAGVAGALVAFKIMSGEV 
CTGGGAAAGGTCCTGGTCGACATTCTGGCTGGCTAT6GCGCTGGCGTCGCCGGAGCCCTC6TGGCTTTCAAAATCATGAGCGGAGAG6TC 

Gene : HepCla 

Segments : 125 
Offset : 1861 
1st Codon : 1 

VAGALVAFKIMSGEVPSTEDLVNLIiPAlLS 
GTGGCTGGCGCTCTGGTCGCCTTTAAGATTAT6TCCGGCGAAGTGCCTAGCACAGAGGATCTGGTCAACCTCCTGCCTGCCATTCTGTCC 

Gene : HepCla 

Segments : 126 
Offset : 1876 
1st Codon : 1 

PSTEDLVNLLPAILSPGAIiVVGVVCAAlLR 
CCCTCCACCGAAGACCTCGTGAATCTGCTCCCCGCTATCCTCAGCCCTGGCGCTCTGGTCGTGGGAGTGGTCTGCGCTGCCATTCTGAGA 

Gene : HepCla 
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Segment # : 127 
Offset : 1891 
Isfc Codon : 1 

PGAIiVVGVVCAAILRRHVGPGEGAVQWMNR 
CCCGGAGCCCTCGTGGTCGGCGTCGTGTGTGCCGCTATCCTCAGGAGACACGTCGGCCCTGGCGAAGGCGCTGTGCAATGGATGAACAGA 

Gene : HepCla 

Segment # : 12 8 
Offset : 1906 

1st Codon ; 1 

RHVGPGEGAVQWMNRLIAFASRGNHVS PTH 
AGGCATGTGGGACCCGGAGAGGGAGCCGTCCAGTGGATGAATAGGCTCyVTCGCTTTCGCTAGCAGAGGCAATCACGTCAGCCCTACCCAT 

Gene ; HepCla 

Segment# : 129 
Offset : 1921 
1st Codon : 1 

LIAFASRGNHVSPTHYVPESDAAARVTA IL 
CTGATTGCCTTTGCCTCCAGGGGAAACCATGTGTCCCCCACACACTATGTGCCTGAGTCCGACGCTGCCGCTAGGGTCACCGCTATCCTC 

Gene : HepCla 

Segmenttt : 130 
Offset : 1936 
1st Codon : 1 

YVPESDAAARVTAIIjSSLTVTQLLRRLHQW 
TACGTCCCCGAAAGCGATGCCGCTGCCAGAGTGACAGCCATTCTGTCCAGCCTCACCGTCACCCAACTGCTCAGGAGACTGCATCAGTGG 

Gene : HepCla 

SegTnent# : 131 
Offset : 1951 

1st Codon : 1 

SSIiTVTQIiLRRLHQWI SSECTTPCSGSWLR 
AGCTCCCTGACAGTGACACAGCTCCTGAGAAGGCTCCACCAATGGATTAGCTCCGAGTGTACCACACCCTGTAGCGGAAGCTGGCTGAGA 

Gene : HepCla 

Segment # : 132 
Offset : 1966 
1st Codon : 1 

ISSECTTPCSGSWLRDIWDWICEVLSDFKT 
ATCTCCAGCGAATGCACAACCCCTTGCTCC6GCTCCTGGCTCAGGGATATCTGGGACTGGATCTGTGAGGTCCTGTCCGACTTTAAGACA 

Gene : HepCla 

Segment# : 133 
Offset : 1981 

1st Codon : 1 

DIWDWlCEVIiSDFKTWIiKAKLMPQLPGIPF 
GACATTTGGGATTGGATTTGCGAAGTGCTCAGCGATTTCAAAACCTGGCTGAAAGCCAAACTGATGCCCCAACTGCCTGGCATTCCCTTT 

Gene ; HepCla 

Segment# : 134 
Offset : 1996 

1st Codon : 1 

WLKAKLMPQLPGIPFVSCQRGYKGVWRGDG 
TGGCTCAAGGCTAA6CTGATGCCTCAGCTCCCCGGAATCCCTTTCGTCAGCTGTCAGAGAGGCTATAAGGGAGTGTGGAGGGGAGACGGA 

Gene : HepCla 

Segments : 135 
Offset : 2011 
1st Codon : 1 

VSCQRGYKGVWRGDGIMHTRCHCGAEITGH 
GTGTCCTGCCAAAGGGGATACAAAGGCGTCTGGAGAGGCGATGGCATTATGCATACCAGATGCCATTGCGGAGCCGAAATCACAGGCCAT 

Gene : HepCla 

Segment # : 136 
Offset : 2026 

1st Codon : 1 

IMHTRCHCGAEITGHVKNGTMRIVGPRTCR 
ATCATGCACACAAGGTGTCACTGTGGCGCTGAGATTACCGGACACGTCAAGAATGGCACAATGAGAATCGTCGGCCCTAGGACATGCAGA 



Gene : HepCla 

Segment# : 13 7 

Offset : 2041 

1st Codon : 1 



Figure 26 (Cent) 



wo 01/90197 



PCT/AUOl/00622 



124/216 

VKNGTMRIVGPRTCRNMWSGTFPINAYTTG 
GTGAAAAACGGAACCATGAGGATTGTGGGACCCAGAACCTGTAGGAATATGTGGAGCGGAACCTTTCCCATTAACGCTTACACAACCGGA 

Gene : HepCla 

Segment# i 138 
Offset : 2056 
1st Codon : 1 

NMWSGTPPINAYTTGPCTPLPAPNYTFAIiW 
AACATGTGGTCCGGCACATTCCCTATCAATGCCTATACCAmGGCCCTTGCACACCCCTCCCCGCTCCCAATTACaLCATTCG^ 

Gene : HepCla 

Segment# : 13 9 

Offset : 2071 ' 

1st Codon : 1 

PCTPIjPAPNYTFALWRVSAEEYVEIRRVGD 
CCCTGTACCCCTCTGCCTGCCCCTAACTATACCTTTGCCCTCTGGAGA6TGTCCGCCGAAGAGTATGTGGAAATCAGAAGGGTCGGCGAT 

Gene : HepCla 

Segment# : 140 
Offset : 2086 

1st Codon : 1 

RVSAEEYVEIRRVGDFHYVTGMTTDNLKCP 
AGGGTCAGCGCTGAGGAATACGTCGAGATTAGGAGAGTGGGAGACTTTCACTATGTGACAGGCATGACCACAGACAATCTGAAATGCCCT 

Gene : HepCla 

Segment # : 141 
Offset : 2101 

1st Codon : 1 

FHYVTGMTTDNLKCPCQVPSPEFFTEIiDGV 
TTCCATTACGTCACCGGAATGACAACCGATAACCTCAAGTGTCCCTGTCAGGTCCCCTCCCCCGAATTCTTTACCGAACTGGATGGCGTC 

Gene : HepCla 

Segment# : 142 
Offset : 2116 

1st Codon : 1 

CQVPSPBFFTELDGVRLHRFAPPCKPIjLRE 
TGCCAAGTGCCTAGCCCTGAGTTTTTCACAGA6CTCGACGGAGTGAGACTGCATAGGTTTGCCCCTCCCT6TAAGCCTCTGCTCAGGGAA 

Gene : HepCla 

Segment # : 143 
Offset : 2131 
1st Codon : 1 

RIiHRFAPPCKPLLREEVS FRVGLHEYPVGS 
AGGCTCCACAGATTCGCTCCCCCTTGCAAACCCCTCCTGAGAGAGGAAGTGTCCTTCAGAGTGGGACTGCATGAGTATCCCGTCGGCTCC 

Gene : HepCla 

Segment # : 144 
Offset ; 2146 

1st Codon : 1 

EVSFRVGLHEYPVGSQLPCEPEPDVAVLTS 
6AGGTCAGCTTTAGGGTCGGCCTCCACGAATACCCTGTGGGAAGCCAACTGCCTTGCGAACCCGAACCCGATGTGGCTGTGCTCACCTCC 

Gene : HepCla 

Segment # : 145 
Offset : 2161 
1st Codon : 1 

QLPCEPEP DVAVLTSMLTDPSHITAEAAGR 
CAGCTCCCCT6TGAGCCTGAGCCTGACGTCGCCGTCCTGACAAGCATGCTGACAGACCCTAGCCATATCACAGCCGAAGCCGCTGGCAGA 

Gene : HepCla 

Segment# : 146 
Offset I 2176 

1st Codon : 1 

MliTDPSHITAEAAGRRLARGSPPSMASSSA 
ATGCTCACCGATCCCTCCCACATTACCGCTGAGGCTGCCGGAAGGAGACTGGCTAGGGGAAGCCCTCCCTCCATGGCTAGCTCCAGCGCT 

Gene : HepCla 

Segment# : 147 
Offset : 2191 
1st Codon ; 1 

RIjARGS PPSMASSSASQLSAPSLKATCTAN 

aggctcgccagaggctccccccctagcatggcctccagctccgcctcccagctcagcgctccctccctgaaagccacatgcacagccaat 
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Segment# : 148 
Offset : 2206 
1st Codon : 1 

SQLSAPSLKATCTAWHDSPDAELIEANLLW 
AGCCAACTGTCCGCCCCTAGCCTCAAGGCTACCTGTAeCGCTAACCATGACTCCCCCGATGCCGAACTGATTGAGGCTAACCTCCTGTGG 

Gene : HepCla 

Segment # : 149 
Offset : 2221 

1st Codon : 1 

HDSPDAELI EANIiLWRQEMCGNITRVESEN 
CACGATAGCCCTGACGCTGAGCTCATCGAAGCCAATCTGCTCTGGAGACA6GAAATGGGAGGCAATATCACAAGGGTCGAGTCCGAGAAT 

Gene : HepCla 

Segment # : 15 0 
Offset : 2236 
1st Codon : 1 

RQEMGGNITRVESENKVVIIiDSFDPIiVAEE 
AGGCAAGAGATGGGCGGAAACATTACCAGAGTGGAAAGCGAAAACAAAGTGGTCATCCTCGACTCCTTCGATCCCCTCGTGGCTGAGGAA 

Gene : HepCla 

Segment # : 151 
Offset : 2251 
1st Codon : 1 

KVVILDSPDPLVAEEDEREISVPAEILRKS 
AAGGTCGTGATTCTGGATAGCTTTGACCCTCTGGTCGCCGAAGAGGATGAGAGAGAGATTAGCGTCCCCGCTGA6ATTCTGAGAAAGTCC 

Gene : HepCla 

Segment # : 152 
Offset : 2266 

1st Codon : 1 

DEREI SVPAEILRKSRRFAQALPVWARPDY 
GACGAAAGGGAAATCTCCGTGCCTGCCGAAATCCTCAGGAAAAGCAGAAGGTTTGCCCAAGCCCTCCCCGTCTGGGCTAGGCCTGACTAT 

Gene : HepCla 

Segment # : 153 
Offset : 2281 
1st Codon : 1 

RRPAQAIiPVWARPDYNPPLVETWKKPDYEP 
AGGAGATTCGCTCAGGCTCTGCCTGTGTGGGCCAGACCCGATTACAATCCCCCTCTGGTCGAGACATG6AAAAAGCCTGACTATGAGCCT 

Gene : HepCla 

Segment # : 154 
Offset : 2296 

1st Codon : 1 

NPPIiVETWKKPDYEPPVVHGCPIiPPPRSPP 
AACCCTCCCCTCGTGGAAACCTGGAA6AAACCCGATTACGAACCCCCTGTGGTCCAC6GATGCCCTCTGCCTCCCCCTAGGTCCCCCCCT 

Gene : HepCla 

Segment # ; 155 
Offset : 2311 

1st Codon : 1 

PVVHGCPIiPPPRSPPVPPPRKKRTVVLTES 
CCCGTCGTGCATGGCTGTCCCCTCCCCCCTCCCAGAAGCCCTCCCGTCCCCCCTCCCAGAAA.GAAAAGGACAGTGGTCCTGACAGAGTCC 

Gene : HepCla 

Segment# : 156 
Offset : 2326 

1st Codon : 1 

VPPPRKKRTVVLTESTLSTALAELATKSFG 
GTGCCTCCCCCTAGGAAAAAGAGAACCGTCGTGCTCACCGAAAGCACACTGTCCACCGCTCT6GCTGAGCTCGCCACAAAGTCCTTCGGA 

Gene : HepCla 

Segment# : 157 
Offset : 2341 

1st Codon : 1 

TLSTAIiAELATKSFGSSSTSGITGDNTTTS 
ACCCTCAGCACAGCCCTCGCCGAACTGGCTACCAT^GCTTTGGCTCCAGCTCCACCTCCGGCATTACCGGAGACAATACCACAACCTCC 

Gene : HepCla 

Segments : 158 
Offset : 2356 
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1st Codon : 1 

SSSTSGITGDNTTTSSEPAPSGCPPDSDAE 
AGCTCCAGCAC?UlGCGGAATCACAGGCGATAACAC:aACCACAAGCTCCGAGCCTGCCCCTAGCGGATGCCCTCCCGATAGCGATGCCGAA 

Gene : HepCla 

Segment # : 159 
Offset : 2371 
1st Codon : 1 

SEPAPSGCPPDSDAESYSSMPPLEGEPGDP 
AGCGAACCCGCTCCCTCCGGCTGTCCCCCTGACTCCGACGCTGAGTCCTACTCCAGCATGCCCCCTCTGGAAGGCGAACCCGGAGACCCT 

Gene : HepCla 

Segment # : 160 
Offset : 2386 

1st Codon : 1 

SYSSMPPLEGEPGDPDLSDGSWSTVSSEAG 
AGCTATAGCTCCATGCCTCCCCTCGAGG6AGAGCCTGGCGATCCCGATCTGTCCGACGGAAGCTGGAGCACAGTGTCCAGCGAAGCCGGA 

Gene : HepCla 

Segment# : 161 
Offset : 2401 
1st Codon : 1 

DLSDGSWSTVSSEAGTEDVVCCSMSYSWTG 
GACCTCAGCGATGGCTCCTGGTCCACCGTCAGCTCCGAGGCTGGCACAGAGGATGTGGTCTGCTGTAGCATGAGCTATAGCTGGACCGGA 

Gene : HepCla 

Segment # : 162 
Offset : 2416 

1st Codon : 1 

TEDVVCCSMSYSWTGALVTPCAAEEQKLPI 
ACCGAAGACGTCGTGTGTTGCTCCATGTCCTACTCCTGGACAGGCGCTCTGGTCACCCCTTGCGCTGCCGAAGAGCAAAAGCTCCCCATT 

Gene : HepCla 

Segment# : 163 
Offset : 2431 

1st Codon : 1 

ALVTPCAAEEQKLPINALSNSLLRHHNLVY 
GCCCTCGTGACACCCTGTGCCGCTGAGGAACAGAAACTGCCTATCAATGCCCTCAGCAATAGCCTCCTGAGACACCATAACCTCGTGTAT 

Gene : HepCla 

Segment # : 164 
Offset : 2446 

1st Codon : 1 

NALSNSLLRHHNIiVYSTTSRSACQRQKKVT 
AACGCTCTGTCCAACTCCCTGCTCAGGCATCACAATCTGGTCTACTCCACCACAAGCAGAAGCGCTTGCCAAAGGCAAAAGAAAGTGACA 

Gene : HepCla 

Segment # : 165 
Offset : 2461 
1st Codon- : 1 

STTSRSACQRQKKVTFDRLQVIiDSHYQDVL 
AGCACAACCTCCAGGTCCGCCTGTCAGAGACAGAAAAAGGTCACCTTTGACAGACTGCAAGTGCTCGACTCCCACTATCAGGATGTGCTC 

Gene : HepCla 

Segment # : 166 
Offset : 2476 
1st Codon : 1 

FDRLQVLDSHYQDVLKEVKAAASKVKANLL 
TTCGATAGGCTCCAGGTCCTGGATAGCCATTACCAAGACGTCCTGAAAGAGGTCAAGGCTGCCGCTAGCAAAGTGAAAGCCAATCTGCTC 

Gene : HepCla 

Segment# : 167 
Offset : 2491 

1st Codon : 1 

KEVKAAASKVKANLLSVEEACSLTPPHSAK 
AAGGAAGTGAAAGCCGCTGCCTCCAAGGTCAAGGCTAACCTCCTGTCCGTGGAAGAGGCTTGCTCCCTGACACCCCCTCACTCCGCCAAA 

Gene : HepCla 

Segment# : 168 
Offset : 2506 
1st Codon : 1 

SVEEACSIiTPPHSAKSKFGYGAKDVRCHAR 
AGCGTCGAGGAAGCCTGTAGCCTCACCCCTCCCCATAGCGCTAAGTCCAAGTTTGGCTATGGCGCTAAGGATGTGAGATGCCATGCCAGA 
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Gene 



HepCla 



Segment^ : 169 
Offset : 2521 
1st Codon : 1 

SKFGYGAKDV'RCHARKAVAHI NSVWKDLLE 
AGCS^TVATTCGGATACGGAGCCAAAGACGTCAGGTGTCTiLCGCTAGGAAAGCCGTCGCCCATATCAATAGCGTCTGGA;^ 

Gene : HepCla 

Segment # : 170 
Offset : 2536 

1st Codon : 1 

KAVAHINSVWKDLLEDSVTPIDTTIMAKNE 
AAGGCTGTGGCTCACATTAACTCCGTGTGGAAGGATCTGCTCGAGGATAGCGTCACCCCTATCGATACCACAATCATGGCCAl^AAACGAA 

Gene : HepCla 

Segments : 171 
Offset : 2551 

1st Codon : 1 

DSVTPIDTTIMAKNEVFCVQPEKGGRKPAR 
GACTCCGTGACACCCATTGACACAACCATTATGGCTAAGAATGAGGTCTTCTGTGTGCAACCCGAAAAGGGAGGCAGAAAGCCTGCCAGA 

Gene : HepCla 

Segment* : 172 
Offset : 2566 

1st Codon : 1 

VFCVQPEKGGRKPARLIVFPDIiGVRVCEKM 
GTGTTTTGCGTCCAGCCTGAGAAAGGCGGAAGGAAACCCGCTAGGCTCATCGTCTTCCCTGACCTCGGCGTCAG6GTCTGCGAAAAGATG 

Gene : HepCla 

Segment* : 173 
Offset : 2581 
1st Codon : 1 

LIVFPDLGVRVCEKMALYDVVS KLPIiAVMG 
CTGATTGTGTTTCCCGATCTGGGAGTGAGAGTGTGTGAGAAAATGGCTCTGTATGACGTCGTGTCCAAGCTCCCCCTCGCCGTCATGGGA 

Gene : HepCla 

Segment* : 174 
Offset : 2596 
1st Codon : 1 

AIiYDVVSKItPIiAVMGSSYGFQYSPGQRVEF 
GCCCTCTACGATGTGGTCAGCAAACTGCCTCTGGCTGTGATGGGCTCCAGCTATGGCTTTCAGTATAGCCCTGGCCAAAGGGTCGAGTTT 

Gene : HepCla 

Segment* : 175 
Offset : 2611 
1st Codon : 1 

SSYGFQYSPGQRVEFLVQAWKSKKTPMGFS 
AGCTCCTACGGATTCCAATACTCCCCCGGACAGAGAGTGGAATTCCTCGTGCAAGCCTGGAAGTCCAAGAAAACCCCTATGGGATTCTCC 

Gene : HepCla 

Segment* : 176 
Offset : 2626 

1st Codon : 1 

LVQAWKS KKTPMGFSYDTRCF'DSTVTESDI 
CTGGTCCAGGCTTGGAAAAGCAAAAAGACACCCATGGGCTTTAGCTATGACACAAGGTGTTTCGATAGCACAGTGACAGAGTCCGACATT 

Gene : HepCla 

Segment* : 177 
Offset : 2641 
1st Codon : 1 

YDTRCFDSTVTESDIRTEEAIYQCCDLDPQ 
TACGATACCAGATGCTTT6ACTCCACCGTCACCGAAAGCGATATCAGAACCGAAGAGGCTATCTATCAGTGTTGCGATCTGGATCCCCAA 

Gene : HepCla 

Segment* : 178 
Offset : 2656 
1st Codon : 1 

RTEEAIYQCCDLDP QARVAIKSLTERLYVG 
AGGACAGAGGAAGCCATTTACCAATGCTGTGACCTCGACCCTCAGGCTAGGGTCGCCATTAAGTCCCTGACAGAGAGACTGTATGTGGGA 

Gene : HepCla 

Segment* : 179 
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Offset : 2671 
1st Codon : 1 

ARVAIKSLTERLYVGGPLTNSRGENCGYRR 
GCCAGAGTGGCTATCAAAAGCCTCACCGAAAGGCTCTACGTCGGCGGACCCCTCACCAATAGCAGAGGCGAAAACTGTGGCTATAGGAGA 

Gene : HepCla 

Segment # : 180 
Offset : 2686 
1st Codon : 1 

GPLTNSRGEWCGYRRCRASGVLTTSCGKTTL 
GGCCCTCTGACAAACTCCAGGGGAGAGAATTGCGGATACAGAAGGTGTAGGGCTAGCGGAGTGCTCACCACAAGCTGTGGCAATACCCTC 

Gene : HepCla 

Segment# : 181 
Offset ; 2701 
1st Codon : 1 

CRASGVLTTSCGNTLTCYIKARAACRAAGL 
TGCAGAGCCTCCGGCGTCCTGACAACCTCCTGCGGAAACACACTGACATGCTATATCAAAGCCAGAGCCGCTTGCAGAGCCGCTGGCCTC 

Gene : HepCla 

Segments : 182 
Offset : 2716 
1st Codon : 1 

TCYIKARAACRAAGLQDCTMLVCGDDLVVI 
ACCTGTTACATTAAGGCTAGGGCTGCCTGTAGGGCTGCCGGACTGCAAGACTGTACCATGCTGGTCTGCGGAGACGATCTGGTCGTGATT 

Gene : HepCla 

Segments : 183 
Offset : 2731 

1st Codon : 1 

QDCTMLVCGDDLVVICESAGVQEDAASLRA 
CAGGATTGCACAATGCTCGTGTGTGGCGATGACCTCGTGGTCATCTGTGAGTCCGCCGGAGTGCAAGAGGATGCCGCTAGCCTCAGGGCT 

Gene : HepCla 

Segments : 184 
Offset : 2746 
1st Codon : 1 

CESAGVQEDAASLRAFTEAMTRYSAPPGDP 
TGCGAAAGCGCTGGC6TCCAGGAAGACGCTGCCTCCCT6AGAGCCTTTACCGAAGCCATGACCA6ATACTCCGCCCCTCCCGGA6ACCCT 

Gene : HepCla 

Segments : 185 
Offset : 2761 
1st Codon : 1 

FTEAMTRYSAPPGDPPQPEYDLELITSCSS 
TTCACAGAGGCTATGACAAGGTATAGCGCTCCCCCTGGCGATCCCCCTCAGCCTGAGTATGACCTCGAGCTCATCACAAGCTGTAGCTCC 

Gene ; HepCla 

Segments : 186 
Offset : 2776 

1st Codon : 1 

PQPEYDLELITSCSSNVSVAHDGAGKRVYY 
CCCCAACCCGAATACGATCTGGAACTGATTACCTCCTGCTCCAGCAATGTGTCCGTGGCTCACGATGGCGCTGGCAAAAGGGTCTACTAT 

Gene : HepCla 

Segments : 187 
Offset : 2791 
1st Codon : 1 

NVSVAHDGAGKRVYYLT RDPTTPLARAAWE 
AACGTCAGCGTCGCCCATGACGGAGCCGGAAAGAGAGTGTATTACCTCACCAGAGACCCTACCACACCCCTC6CCAGAGCCGCTTGG6AA 

Gene : HepCla 

Segments : 188 
Offset : 2806 

1st Codon : 1 

LTRDPTTPIiARAAWETARHTPVNSWIjGNII 
CTGACAAGGGATCCCACAACCCCTCTGGCTAGGGCTGCCTGGGAGACAGCCAGACACACACCCGTCAACTCCTGGCTCGGCAATATCATT 

Gene : HepCla 

Segments : 189 
Offset : 2821 

1st Codon : 1 

tarhtpvnswlgniimpaptlwarmilmth 
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ACCGCTAGGCATACCCCTGTGAATAGCTGGCTGGGJU^CaTTATCATGTTCGCTCCCACACTGTGGGCCAGAATGATTCTGATGACCCAT 

Gene : HepGla 

Segment # : 190 
Offset : 2836 

1st Codon : 1 

MFAPTIiWARMIIiMTHFFSVLIARDQLEQAL 
ATGTTTGCCCCTACCCTCTGGGCTAGGATGATCCTCATGACACACTTTTTCTCCGTGCTCATCGCTAGGGATCAGCTCGAGCAAGCCCTC 

Gene : HepCla * 

Segment # : 191 
Offset : 2851 
1st Codon : 1 

FFSVLIARDQXiEQALDCEIYGACYSIEPLD 
TTCTTTAGCGTCCTGATTGCCAGAGACCAACTGGAACAGGCTCTGGATTGCGAAATCTATGGCGCTTGCTATAGCATTGAGCCTCTGGAT 

Gene : HepCla 

Segment # : 192 
Offset : 2866 

1st Codon : 1 

DCEIYGACysIEPLDIiPPIJQ-RLHGLSAFS 
GACTGTGAGATTTACGGAGCCTGTTACTCCATCGAACCCCTCGACCTCCCCCCTATCATTCAGAGACTGCATGGCCTCAGCGCTTTCTCC 

Gene : HepCla 

Segments : 193 
Offset : 2881 

1st Codon : 1 

LPPIIQRLHGLSAFSLHSYSPGEINRVAAC 
CTGCCTCCCATTATCCAAAGGCTCCACGGACTGTCCGCCTTTAGCCTCCACTCCTACTCCCCCGGAGAGATTAACAGAGTGGCTGCCTGT 

Gene : HepCla 

Segments : 194 / 
Offset : 2896 
1st Codon : 1 

IiHSYSPGEINR -VAACIjRKIiGVPPI.RAWRHR 
CTGCATAGCTATAGCCCTGGCGAAATCAATAGGGTCGCCGCTTGCCTCAGGAAACTGGGAGTGCCTCCCCTCAGGGCTTGGAGACACAGA 

Gene : HepCla 

Segments : 195 
Offset : 2911 
1st Codon : 1 

LRKLGVPPLRAWRHRARSVRARLLARGGRA 
CTGAGAAAGCTCGGCGTCCCCCCTCTGAGAGCCTGGAGGCATAGGGCTAGGTCCGTGAGAGCCAGACTGCTCGCCAGAGGCGGAAGGGCT 

Gene : HepCla 

Segments : 196 
Offset : 2926 

1st Codon : 1 

ARSVRARLLARG6RAAICGKYLFNWAVRTK 
GCCAGAAGCGTCAGGGCTAGGCTCCTGGCTAGGGGAGGCAGAGCCGCTATCTGTGGCAAATACCTCTTCAATTGGGCTGTGAGAACCAAA 

Gene : HepCla 

Segments : 197 
Offset : 2941 
1st Codon : 1 

AICGKYLPNWAVRTKLKLTPIAAAGRLDLS 
GCCATTTGCGGAAAGTATCTGTTTAACTGGGCCGTCAGGACAAAGCTCAAGCTCACCCCTATCGCTGCCGCTGGCAGACTGGATCTGTCC 

Gene : HepCla 

Segments : 198 
Offset : 2956 

1st Codon : 1 

LKLTP lAAAGRLDLSGWFTAGYSGGDIYHS 
CTGAAACTGACACCCATTGCCGCTGCCGGAAGGCTCGACCTCAGCGGATGGTTTACCGCTGGCTATAGCGGAGGCGATATCTATCACTCC 

Gene : HepCla 

Segments : 199 
Offset : 2971 
1st Codon : 1 

GWFTAGYSGGDIYHSVSHARPRWFWFCLIiIj 
GGCTGGTTCACAGCCGGATACTCCGGCGGAGACATTTACCATAGCGTCAGCCATGCCAGACCCAGATGGTTTTGGTTTTGCCTCCTGCTC 

Gene : HepCla 
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SegTnent# : 200 
Offset : 2986 
1st Codon : 1 

VSHARPRWFWFCLLLLAAGVGIYLLPKRAA 
GTGTCCCACGCTAGGCCTAGGTGGTTCTGGTTCTGTCTGCTCCTGCTCGCCGCTGGCGTCGGCATTTACCTCCTGCCTAACAGAGCCGCT 

Gene * : HepCla 
Segments : 201 
Offset : 3001 

1st Codon : 1 

liAAGVGIYLIiPNRAA 
CTGGCTGCCGGAGTGGGAATCTATCTGCTCCCCAATAGGGCTGCC 

Segments in scrambled order: 



HepCla #77 

VIPVRRRGDSRGSLIiSPRPISYLKGSSGGP 
GTGATTCCCGTCAGGAGAAGGGGAGACTCCAGGGGAAGCCTCCTGTCCCCCAGACCCATTAGCTATCTGAAAGGCTCCAGCGGAGGCCCT 

HepCla #68 

ARRGREILIiGPADGMYSKGWRLLAPITAYA 
GCCAGAAGGGGAAGGGAAATCCTCCTGGGACCCGCTGACGGAATGGTCAGCAAAGGCTGGAGGCTCCTGGCTCCCATTACCGCTTACGCT 

HepCla #143 

RLHRFAPPCKPLLREEVSFRVGLHEYPVGS 
AGGCTCCACAGATTCGCTCCCCCTTGCAAACCCCTCCTGAGAGAGGAAGTGTCCTTCAGAGTGGGACTGCATGAGTATCCCGTCGGCTCC 

HepCla #66 

VVFSQMETKLITWGADTAACGDI INGLPVS 
GTGGTCTTCTCCCAGATGGAGACAAAGCTCATCACATGGGGAGCCGATACCGCTGCCTGTGGCGATATCATTAACGGACTGCCTGTGTCC 

HepCla #79 

LliCPAGHAVGIFRAAVCTRGVAKAVDF I PV 
CTGCTCTGCCCTGCCGGACACGCTGTGGGAATCTTTAGGGCTGCCGTCTGCACAAGGGGAGTGGCTAAGGCTGTGGATTTCATTCCCGTC 

HepCla #113 

CVVIVGRIV IiSGKPAIIPDREVLYREFDEM 
TGCGTCGTGATTGTGGGAAG6ATTGTGCTCAGCGGAAAGCCTGCCATTATCCCTGACAGAGAGGTCCTGTATAGGGAATTCGATGAGATG 

HepCla #139 

PCTPIiPAPNYTFALWRVSAEEYVEIRRVGD 
CCCT6TACCCCTCTGCCTGCCCCTAACTATACCTTTGCCCTCTGGAGAGTGTCCGCCGAAGAGTATGTGGAAATCAGAAGGGTCGGCGAT 

HepCla #174 

ALYDVVSKIiPIiAVMGSSYGFQYSPGQRVEF 
GCCCTCTACGATGTGGTCAGCAAACTGCCTCTGGCTGTGATGGGCTCCAGCTATGGCTTTCAGTATAGCCCTGGCCAAAGGGTCGAGTTT 

HepCla #57 

ISWCLWWLQYFIiTRVEAQLiHVWVPPLNVRG 
ATCTCCTGGTGTCTGTGGTGGCTCCAGTATTTCCTCACCAGAGTGGAAGCCCAACTGCATGTGTGGGTGCCTCCCCTCAACGTCAGGGGA 

HepCla #51 

ENIiVI ItNAAS IiAGTHG liV S F LV F FC FAWYL 
GAGAATCTGGTCATCCTCAACGCT6CCTCCCTGGCTGGCACACACGGACTGGTCAGCTTTCTGGTCTTCTTTT6CTTTGCCTGGTACCTC 

HepCla #193 

LPPIIQRriHGLSAFSLHSYSPGEINRVAAC 
CTGCCTCCCATTATCCAAAGGCTCCACGGACTGTCCGCCTTTA6CCTCCACTCCTACTCCCCCGGAGAGATTAACAGAGTGGCTGCCTGT 

HepCla #154 

NPPLVETWKKPDYEPPVVHGCPLPPPRSPP 
AACCCTCCCCTCGTGGAAACCTGGAAGAAACCCGATTACGAACCCCCTGT6GTCCAC6GATGCCCTCTGCCTCCCCCTAGGTCCCCCCCT 

HepCla #48 

GVGSSIASWAIKWEYVVLLFLLLADARVCS 
GGCGTC6GCTCCAGCATTGCCTCCTGGGCTATCAAATGGGAATACGTCGTGCTCCTGTTTCTGCTCCTGGCTGACGCTAGGGTCTGCTCC 

HepCla #37 

LNNTRP PLGNWFGCTWMNSTGFTKVCGAPP 
CTGAATAACACAAGGCCTCCCCTCGGCAATTGGTTTGGCTGTACCTGGATGAATAGCACAGGCTTTACCAAAGTGTGTGGCGCTCCCCCT 

HepCla #185 

PTEAMTRYSAPPGDPPQPEYDLELITSCSS 
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TTCACAGAGGCTATGACAAGGTATAGCGCTCCCCCTGGCGATCCCCCTCAGCCTGAGTATGACCTCGAGCTCATCACAAGCTGTAGCTCC 
HepCla #54 

WPLLLIiLIjALPQRAYALDTEVAASCGGVVL 
TGGCCTCTGCTCCTGCTCCTGCTCGCCCTCCCCCAAAGGGCTTACGCTCTGGATACCGAAGTGGCTGCCTCCTGCGGAGGC6TCGTGCTC 

HepCla #70 

QQTRGLLGCIITSLTGRDKNQVEGEVQIVS 
CAGCAAACCAGAGGCCTCCTGGGATGCATTATCACAAGCCTCACCGGAAGGGATAAGAATCAGGTCGAGGGAGAGGTCCAGATTGTGTCC 

HepCla #82 

SSPPAVPQSFQVAHLHAPTGSGKSTKVPAA 
AGCTCCCCCCCTGCCGTCCCCCAAAGCTTTCAGGTCGCCCATCTGCATGCCCCTACCGGAAGCGGAAAGTCCACCAAAGTGCCTGCCGCT 

HepCla #104 

NTPGIiPVCQDHLEFWEGVFTGLTHIDAHFIi 
AACACACCCGGACTGCCTGTGT6TCAGGATCACCTC6AGTTTTGGGAAGGCGTCTTCACAGGCCTCACCCATATCGATGCCCATTTCCTC 

HepCla #26 

VLIiLFAGVDAETHVTGGNAGRTTSGIjVSIj Ii 
GTGCTCCTGCTCTTCGCTGGCGTCGACGCTGAGACACACGTCACCGGAGGCAATGCCGGAAGGACAACCTCCGGCCTCGTGTCCCTGCTC 

HepCla #110 

EVTLTHPVTKYIMTCMSADIiEVVTSTWVLV 
GAGGTCACCCTCACCCATCCCGTCACCAAATACATTAT6ACATGCATGAGCGCTGACCTCGAGGTCGTGACAAGCACATGGGTCCTGGTC 

HepCla #56 

VGLMAIiTLSPYYKRYISWCIjWWI.QYFLTRV 
GTGGGACTGATGGCCCTCACCCTCAGCCCTTACTATAAGAGATACATTAGCTGGTGCCTCTGGTGGCTGCAATACTTTCTGACAAGGGTC 

HepCla #197 

AI CGKYIiFNWAVRTKLKLTPIAAAGRLDLS 
GCCATTTGCGGAAAGTATCTGTTTAACTGGGCCGTCAGGACAAAGCTCAAGCTCACCCCTATCGCTGCCGCTGGCAGACTGGATCTGTCC 

HepCla #25 

lAYFS MVGNWAKVIi VVIj LIiFAGVDAETHVT 
ATCGCTTACTTTAGCATGGTGGGAAACTGGGCCAAAGTGCTCGTGGTCCTGCTCCTGTTTGCCGGAGTGGATGCCGATiACCCATGTGACA 

HepCla #147 

RLARGSPPSMASSSASQIiSAPSIiKATCTAN 
AGGCTCGCCAGAGGCTCCCCCCCTAGCATGGCCTCCAGCTCCGCCTCCCAGCTCAGCGCTCCCTCCCTGAAAGCCACATGCACAGCCAAT 

HepCla #52 

GLVSFLVFFCFAWYLKGRWVPGAVYALYGM 
GGCCTCGTGTCCTTCCTCGTGTTTTTCTGTTTCGCTTGGTATCTGAAAGGCAGATGGGTCCCCGGAGCCGTCTACGCTCT6TATGGCATG 

HepCla #145 

QIiPCEP.EPDVAVLTSMIiTDPSHITAEAAGR 
CAGCTCCCCTGTGAGCCTGAGCCTGACGTCGCCGTCCTGACAAGCATGCTGACAGACCCTAGCCATATCACAGCCGAAGCCGCTGGCAGA 

HepCla #171 

DSVTPIDTTIMAKNEVFCVQPEKGGRKPAR 
GACTCCGTGACACCCATTGACACAACCATTATGGCTAAGAATGAGGTCTTCTGTGTGCAACCCGAAAAGGGAGGCAGAAAGCCTGCCAGA 

HepCla #84 

YAAQGYKVLVLNPSVAATLGFGAYMSKAHG 
TACGCTGCCCAAGGCTATAAGGTCCTGGTCCTGAATCCCTCCGTGGCTGCCACACTGGGATTCGGAGCCTATATGTCCAAGGCTCACGGA 

HepCla #14 

VRNSTGIiYHVTNDCPNSS IVYEAADAILHT 
GTGAGAAACTCCACCGGACTGTATCACGTCACCAAT6ACTGTCCCAATAGCTCCATCGTCTACGAAGCCGCTGACGCTATCCTCCACACA 

HepCla #175 

SSYGFQYSPGQRVEFLVQAWKSKKTPMGFS 
AGCTCCTACGGATTCCAATACTCCCCCGGACAGAGAGTGGAATTCCTCGTGCAAGCCTGGAAGTCCAAGAAAACCCCTATGGGATTCTCC 

HepCla #67 

DTAACGDI INGLPVSARRGREILLGPADGM 
6ACACAGCCGCTT6CGGAGACATTATCAATGGCCTCCCCGTCAGCGCTAGGAGAGGCAGAGAGATTCTGCTCGGCCCTGCCGATGGCATG 

HepCla #148 

SQliSAPSIiKATCTANHDSPDAELIEANIiLW 
AGCCAACTGTCCGCCCCTAGCCTCAAGGCTACCTGTACC6CTAACCATGACTCCCCCGATGCCGAACTGATTGAGGCTAACCTCCTGTGG 
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HepCla #120 

NPAIASIiMAFTAAVTS PLTTSQTLLPNI IiG, 
AACCCTGCa^TTGCCTCCCTGATGGCCTTTACCGCTGCCGTO^CCTCCCCCCTCACCACAAGCCAAACCCTCCTO 

HepCla #176 ' 

LVQAWKSKKTPMGFSYDTRCFDSTVTESD I 
CTGGTCCaGGCTTGGAAAAGCAAAAAGACACCCATGGGCTTTAGCTATGACACAAGGTGTTTCGATAGCACAGTGACAGAGTCCGA(^ 

HepCla #152 

DEREISVPAEILRKSRRFAQALPVWARPDY 
GACGAAAGGGAAATCTCCGTGCCTGCCGA2^TCCTCAGGAAAAGCAGAAGGTTTGCCCAAGCCCTCCCCGTCTGGGCTAGGCCTGACTAT 

HepCla #190 

MFAPTLWARMILMTHPFSVLIARDQIiE QAL 
ATGTTTGCCCCTACCCTCTGGGCTAGGATGATCCTCATGACACACTTTTTCTCCGTGCTCATCGCTAGGGATCAGCTCGAGCAAGCCCTC 

HepCla #96 

SVI PTSGDVVVVATDALMTGYTGDFDSVID 
AGCGTCATCCCTACCTCCGGCGATGTGGTCGTGGTCGCCACAGACGCTCTGATGACCGGATACACAGGCGATTTCGATAGCGTCATCGAT 

HepCla #94 

CHSKKKCDELAAKLVALGINAVAYYRGLDV 
TGCCATAGCAAAAAGAAAT6CGATGAGCTCGCCGCTAAGCTCGTGGCTCTGGGAATCAATGCCGTCGCCTATTACAGAGGCCTCGACGTC 

HepCla #46 

VLPCSPTTDPALSTGLIHLHQNIVDVQYLY 
GTGCTCCCCTGTAGCTTTACCACACTGCCTGCCCTCAGCACAGGCCTCATCCATCTGCATCAGAATATCGTCGACGTCCAGTATCTGTAT 

HepCla #53 

KGRWVPGAVYALYGMWPIjLLLLLALPQRAY 
AAGGGAfkGGTGGGTGCCTGGCGCTGTGTATGCCCTCTACGGAATGTGGCCCCTCCTGCTCCTGCTCCTGGCTCTGCCTCAGAGAGCCTAT 

HepCla #8 7 

SPITYSTYGKFLADGGCSGGAYDIIICDEC 
AGCCCTATCACATACTCCACCTATGGCAA^TTCCTCGCCGATGGCGGATGCTCCGGCGGAGCCTATGACATTATCATTTGCGATGAGTGT 

HepCla #196 

ARSVRARLLARGGRAAI CGKYLFNWAVRT K 
GCCAGAAGCGTCAGGGCTAGGCTCCTGGCTAGGGGAGGCAGAGCCGCTATCTGTGGCAAATACCTCTTCAATTGGGCTGTGAGAACCAAA 

HepCla #170 

KAVAHINSVWKDLLEDSVTPIDTTIMAKNE 
AAG6CTGTGGCTCACATTAACTCCGTGTGGAAGGATCTGCTCGAGGATAGCGTCACCCCTATCGATACCACAATCATGGCCA2U5AACGAA 

HepCla #35 

FTPSPVVVGTTD RSGAPTYSWGANDTDVFV 
TTCACACCCTCCCCCGTCGTGGTCGGCACAACCGATAGGTCCGGCGCTCCCACATACTCCTGGGGAGCOiATGACACAGACGTCTTCGTC 

HepCla #16 

PGCVPCVREGNASRCWVAMTPTVATRDGKL 
CCCGGATGCGTCCCCTGTGTGAGAGAGGGAAACGCTAGCAGATGCTGGGTGGCTATGACACCCACAGTGGCTACCAGAGACGGAAAGCTC 

HepCla #183 

QDCTMLVCGDDIiVVICESAGVQEDAASLRA 
CAGGATTGCACAATGCTCGTGTGTGGCGATGACCTCGTGGTCATCTGTGAGTCCGCCGGAGTGCAAGAGGATGCCGCTAGCCTCAGGGCT 

HepCla #125 

VAGALVAFKIMSGEVPSTEDLVNLLPAILS 
GTGGCTGGCGCTCTGGTCGCCTTTAAGATTATGTCCGGCGAAGTGCCTAGCACAGAGGATCTGGTCAACCTCCTGCCTGCCATTCTGTCC 

HepCla #177 

YDTRCFDSTVTESDIRTEEAIYQCCDLDPQ 
TACGATACCAGATGCTTTGACTCCACCGTCACCGAAAGCGATATCAGAACCGAAGAGGCTATCTATCAGTGTTGCGATCTGGATCCGCAA 

HepCla #103 

EliTPAETTVRLRAYMNTPGLPVCQDHLEFW 
GAGCTCACCCCTGCCGAAACCACAGTGAGACTGAGAGCCTATATGAATACCCCTGGCCTCCCCGTCTGCCAAGACCATCTGGAATTCTGG 

HepCla #186 

PQPEYDLELITSCS SNVSVAHDGAGKRVYY 
CCCCAACCCGAATACGATCTGGAACTGATTACCTCCTGCTCCAGCAATGTGTCCGTGGCTCACGATGGCGCTGGCAAAAGGGTCTACTAT 
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HepCla #9 

liGKVIDTriTCGPADLMGYIPLVGAPLGGAA 
CTGGGAAAGGTCATCGATACCCTCACCT6TGGCTTTGCCGATCTGATGGGCTATATCCCTCTGGTCGGCGCTCCCCTCGGCGGAGCCGCT 

HepCla #93 

AI PLEVIKGGRHLIFCHSKKKCDEIjAAKLV 
GCCATTCCCCTCGAGGTCATCAAAGGCGGAAGGCATCTGATTTTCTGTCACTCCAAGAAAAAGTGTGACGAACTGGCTGCCAAACTGGTC 

HepCla #112 

GGVLAALAAYCLSTGCVVIVGRIVLSGKPA 
GGCGGAGTGCTCGCC6CTCTGGCTGCCTATTGCCTCAGCACAGGCTGTGTGGTCATCGTCGGCAGAATCGTCCTGTCCGGCAAACCCGCT 

HepCla #184 

CE SAGVQEDAASLRAFTEAMTRYSA PPGDP 
TGCGAAAGCGCTGGCGTCCAGGAAGACGCTGCCTCCCTGAGAGCCTTTACCGAAGCCATGACCAGATACTCCGCCCCTCCCGGAGACCCT 

HepCla #199 

GWFTAGYSGGDIYHSVSHARPRWFWFCLIiIj 
GGCTGGTTCACAGCCGGATACTCCGGCGGAGACATrTACCATAGCGTCAGCCATGCCAGACCCAGATGGTTTTGGTTTTGCCTCCTGCTC 

HepCla #158 

SS STSGITGDNTTTSSEPAPSGCPPDSDAE 
AGCTCCAGCACAAGCGGAATCACAGGCGATAACACAACQACAAGCTCCGAGCCTGCCCCTAGCGGATGCCCTCCCGATAGCGATGCCGAA 

HepCla #10 0 

RTQRRGRTGRGKPGIYRFVAPGERPSGMFD 
A6GACACAGAGAAGGGGAAGGACAGGCAGAGGCAAACCCGGAATCTATAGGTTTGTGGCTCCCGGAGAGAGACCCTCCGGCATGTTCGAT 

HepCla #43 

VRMYVGGVEHRIiEAACNWTRGERCDIiEDRD 
GTGAGAATGTATGTGGGAGGCGTCGA6CATAGGCTCGAGGCTGCCTGTAACTGGACCAGAGGCGAAAGGTGTGACCTCGAGGATAGGGAT 

HepCla #58 

EAQLHVWVPPLNVRGGRDAVILLMCVVHPT 
GAGGCTCAGCTCCACGTCTGGGTCCCCCCTCTGAATGTGAGAGGCGGAAGGGATGCCGTCATCCTCCTGATGTGCGTCGTGCATCCCACA 

HepCla #4 

LGVRATRKTSERSQPRGRRQPIPKARRPEG 
CTGGGAGTGAGAGCCACAAGGAAAACCTCCGAGAGAAGCCAACCCAGAGGCAGAAGGCAACCCATTCCCAAAGCCAGAAGGCCTGAGGGA 

HepCla #18 7 

NV SVAHDGAGKRVYYLTRDPTTPLARAAWE 
AACGTCAGCGTCGCCCATGACGGAGCCGGAAAGAGAGTGTATTACCTCACCAGAGACCCTACCACACCCCTCGCCAGAGCCGCTTGGGAA 

HepCla #15 9 

SEPAPSGCPPDSDAESYSSMPPLEGEPGDP 
AGCGAACCCGCTCCCTCCGGCTGTCCCCCTGACTCCGACGCTGAGTCCTACTCCAGCATGCCCCCTCTGGAAGGCGAACCCGGAGACCCT 

HepCla #63 

IGGHYVQMAI IKLGALTGTYVYNHLTPLRD 
ATCGGAG6CCATTACGTCCAGATGGCCATTATCAAACTGGGAGCCCTCACCGGAACCTATGTGTATAACCATCTGACACCCCTCAGGGAT 

HepCla #126 

PSTEDLVNLLPAILSPGABVVGVVCAAILR 
CCCTCCACC6AAGACCTCGTGAATCTGCTCCCCGCTATCCTCAGCCCTGGCGCTCTGGTCGTGGGAGTGGTCTGCGCTGCCATTCTGAGA 

HepCla #24 

I XiDMlAGAHWGVLAGIAYFSMVGNWAKVLV 
ATCCTCGACATGATCGCTGGCGCTCACTGGGGCGTCCTGGCTGGCATTGCCTATTTCTCCATGGTCGGCAATTGGGCTAAGGTCCTGGTC 

HepCla #7 

BGCGWAGWLLSPRGSRPSWGPTDPRRRSRN 
GAGGGATGCGGATGGGCTGGCTGGCTGCTCAGCCCTAGGGGAAGCAGACCCTCCTGGGGACCCACAGACCCTAGGAGAAGGTCCAGGAAT 

HepCla #21 

WTTQGCNCSIYPGHITGHRMAWDMMMNWSP 
T6GACAACCCAAGGCTGTAACTGTAGCATTTACCCTGGCCATATCACAG6CCATAGGATGGCCTGGGACATGAT6ATGAACTGGAGCCCT 

HepCla #17 

WVAMTPTVAT.RDGKIiPATQIiRRHIDliLVGS 
TGGGTCGCCATGACCCCTACCGTCGCCACAAGGGATGGCAAACTGCCTGCCACACAGCTCAGGAGACACATTGACCTCCTGGTCGGCTCC 

HepCla #42 
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RIiWHYPCTINYTI PKVRMYVGGVBHRLEAA 
AGGCTCTGGCATTACCCTTGCACAATCAATTACACAATCTTTAAGGTCAGGATGTACGTCGGCGGAGTGGAACACAGACTGGAAGCCGCT 

HepCla #172 

VPCVQPEKGGRKPARLIVFPDIiGVRVCEKM 
GTGTTTTGCGTCCAGCCTGAGAAAGGCGGAAGGAAACCCGCTAGGCTCATCGTCTTCCCTGACCTCGGCGTCAGGGTCTGCGAAAAGATG 

HepCla #10 

MGYIPI.VGAPLGGAARALAHGVRVI1EDGVN 
ATGGGATACATTCCCCTCGTGG6AGCCCCTCTGGGAGGCGCTGCCAGAGCCCTCGCCCATGGCGTCAGGGTCCTGGAAGACGGAGTGAAT 

HepCla #27 

GGNAGRTTSGLVSLLTPGAKQNIQIiINTNG 
GGCGGAAACGCTGGCAGAACCACAAGCGGACTGGTCAGCCTCCTGACACCCGGAGCCAAACAGAATATCCAACTGATTAACACAAACGGA 

HepCla #13 

LAIiLSCZiTVPASAYQVRNSTGLYHVTNDCP 
CTGGCTCTGCTCAGCTGTCTGACAGTGCCTGCCTCCGCCTATCAGGTCAGGAATAGCACAGGCCTCTACCATGTGACAAACGATTGCCCT 

HepCla #71 

GRDKNQVEGEVQIVSTAAQTPLATCINGVC 
GGCAGAGACAAAAACCAAGTGGAAGGCGAAGTGCAAATCGTCAGCACAGCCGCTCAGACATTCCTCGCCACATGCATTAACGGAGTGTGT 

HepCla #18 

PATQLRRHIDLLVGSATLCSALYVGDLCGS 
CCCGCTACCCAACT6AGAAGGCATATCGATCTGCTCGTGGGAAGCGCTACCCTCTGCTCCGCCCTCTACGTCGGCGATCTGTGTGGCTCC 

HepCla #83 

HAPTGSGKSTKVPAAYAAQGYKVLVIiNPSV 
CACGCTCCCACAGGCTCCGGCAAAAGCACAAAGGTCCCCGCTGCCTATGCCGCTCAGGGATACAAAGTGCTCGTGCTCAACCCTAGCGTC 

HepCla #6 

RTWAQPGYPWPLYGWEGCGWAGWLLSPRGS 
AGGACATGGGCTCAGCCTGGCTATCCCTGGCCCCTCTACGGAAACGAAGGCTGTGGCTGGGCCGGATGGCTCCTGTCCCCCAGAGGCTCC 

HepCla #162 

T EDVVCCSMSYSWTGAIi''VTPCAAEEQKIiPl 
ACCGAAGACGTCGTGTGTTGCTCCATGTCCTACTCCTGGACAGGCGCTCTGGTCACCCCTTGCGCTGCCGAAGAGCAAAAGCTCCCCATT 

HepCla #55 

ALDTEVAASCGGVVLVGLMALTLS PYYKRY 
GCCCTCGACACAGAGGTCGCCGCTAGCTGTGGCGGAGTG6TCCTGGTCGGCCTCATGGCTCTGACACTGTCCCCCTATTACAAAAGGTAT 

HepCla #3 8 

WMNSTGFTKVCGAP PCVIGGAGNNTLHCPT 
TGGATGAACTCCACCGGATTCACAAAGGTCTGCGGAGCCCCTCCCTGTGTGATTGGCGGAGCCGGAAACAATACCCTCCACTGTCCCACA 

HepCla #168 

SVEEACSLTPPHSAKSKFGYGAKDVRCHAR 
AGCGTCGAGGAAGCCTGTAGCCTCACCCCTCCCCATAGCGCTAAGTCCAAGTTTGGCTATGGCGCTAAGGATGTGAGATGCCATGCCAGA 

HepCla #119 

I SGIQYLAGIiSTLPGWPAIASLMAFTAAVT 
ATCTCCGGCATTCAGTATCTGGCTGGCCTCAGCACACTGCCTGGCAATCCCGCTATCGCTAGCCTCATGGCTTTCACAGCCGCTGTGACA 

HepCla #3 

Q IVGGVYIiLPRRGPRLGVRATRKTSERSQP 
CAGATTGTGGGAGGCGTCTACCTCCTGCCTAGGAGAGGCCCTAGGCTCGGCGTCAGGGCTACCAGAAAGACAAGCGAAAGGTCCCA6CCT 

HepCla #194 

LHSYSPGEINRVAACLRKLGVPPLRAWRHR 
CTGCATAGCTATAGCCCTGGCGAAATCAATAGGGTCGCCGCTTGCCTCAGGAAACTGGGAGTGCCTCCCCTCAGGGCTTGGAGACACAGA 

HepCla #189 

TARH TPVNSWLGNI IMFAPTLWARMlLMTH 
ACCGCTAGGCATACCCCTGTGAATAGCTGGCTGGGAAACATTATCATGTTCGCTCCCACACTGTGGGCCAGAATGATTCTGATGACCCAT 

HepCla #81 

ENLETTMRSPVFTDNSSPPAVPQS FQVAHIi 
GAGAATCTGGAAACCACAATGAGAAGCCCTGTGTTTACCGATAACTCCAGCCCTCCCGCTGTGCCTCAGTCCTTCCAAGTGGCTCACCTC 

HepCla #91 

ATPPGSVTVPHPlSriEEVALSTTGBI PPYGK 
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GCCSVCACCCCCTGGCTCCGTGACAGTGCCTCACCCTAACATTGAGGAAGTGGCTCTGTCCACCACAGGCGAAATCCCTTTCTATGG 
HepCla #60 

LVFDITKLLDAVFGPLWILQASLLKVPYFV 
CTGGTCTTCGATATCACAAAGCTCCTGCTCGCCGTCTTCGGACCCCTCTGGATTCTGCAAGCCTCCCTGCTCAAGGTCCCCTATTTCGTC 

HepCla #23 

TAALVMAQLLRX PQAILDMIAGAHWGV LAG 
ACCGCTGCCCTCGTGATGGCCCAACTGCTCAGGATTCCCCiiAGCCATTCTGGATATGATTGCCGGAGCCCATTGGGGAGTGCTCGCCGGA 

HepCla #98 

CNTCVTQTVDFSLDPTFTIETTTLPQDAVS 
TGCAATACCTGTGTGACACAGACAGTGGATTTCTCCCTGGATCCCACATTCACAATCGAAACCACAACCCTCCCCCAAGACGCTGTGTCC 

HepCla #109 

KG PTP IiDYRIiGAVQMEVTLTHPVT KYIMTC 
CACGGACCCACACCCCTCCTGTATAGGCTCGGCGCTGTGCAAAACGAAGTGACACTGACACACCCTGTGACAAAGTATATCATGACCTGT 

HepCla #179 

ARVAIKSLTERIiYVGGPBTNSRGENCGYRR 
GCCAGAGTGGCTATCAAAAGCCTCACCGAAAGGCTCTACGTCGGCGGACCCCTCACCAATAGCAGAGGCGAAAACT6TGGCTATAGGAGA 

HepCla #39 

CVIGGAGNNTIiHCPTDCFRKHPEATYSRCG 
TGCGTCATCGGAGGCGCTGGCAATAACACACTGCATTGCCCTACCGATTGCTTTAG6AAACACCCTGAGGCTACCTATAGCAGATGCGGA 

HepCla #76 

TCGSSDLYLVTRHADVIPVRRRGDSRGSLL 
ACCTGTGGCTCCAGCGATCTGTATCTGGTCACCAGACACGCT6ACGTCATCCCTGTGAGAAGGAGAGGC6ATAGCAGAGGCTCCCTGCTC 

HepCla #13 8 

NMWSGTFPIKAYTTGPCTPIiPAPNYTFALW 
AACATGT6GTCCGGCACATTCCCTATCAATGCCTATACCACAGGCCCTTGCACACCCCTCCCCGCTCCCAATTACACATTCGCTCTGTGG 

HepCla #89 

HSTDATSILGIGTVLDQAETAGARIiVVLAT 
CACTCCACCGATGCCACAAGCATTCTGGGAATCGGAACCGTCCTGGATCAGGCTGAGACAGCCGGAGCCAGACTGGTCGTGCTCGCCACA 

HepCla #130 

YVPESDAAARVTAILSSLTVTQLLRRIjHQW 
TACGTCCCCGAAAGCGATGCCGCTGCCAGAGTGACAGCCATTCTGTCCAGCCTCACCGTCACCCAACTGCTCAGGAGACTGCATCAGTGG 

HepCla #8 

RPSWGPTDPRRRSRNLGKVIDTLTCGFADL 
AGGCCTAGCTGGGGCCCTACCGATCCCAGAAGGAGAAGCAGAAACCTC6GCAAAGTGATTGACACACTGACATGCGGATTCGCTGACCTC 

HepCla #33 

GP.DQRPYCWHYPPKPCGIVPAKSVCGPVYC 
GGCCCTGACCAAAGGCCTTACTGTTGGCATTACCCTCCCAAACCCTGTGGCATTGTGCCTGCCAAAAGCGTCTGCGGACCCGTCTACTGT 

HepCla #115 

EECSQHLPYIEQGMMLAEQFKQKALiGLIiQT 
GAGGAATGCTCCCAGCATCTGCCTTACATTGAGCAAGGCATGATGCTCGCCGAACAGTTTAAGCAAAAGGCTCTGGGACTGCTCCAGACA 

HepCla #107 

YQATVCARAQAPPPSWDQMWKCLIRLKPTIi 
TACCAAGCCACAGT6TGTGCCA6AGCCCAAGCCCCTCCCCCTAGCTGG6ACCAAATGTGGAAGTGTCTGATTAGGCTCAAGCCTACCCTC 

HepCla #34 

CGIVPAKSVCGPVYCFTPSPVVVGTTDRSG 
TGCGGAATCGTCCCCGCTAAGTCCGTGT6TGGCCCTGTGTATTGCTTTACCCCTAGCCCTGTGGTCGTGGGAACCACAGACAGAAGCGGA 

HepCla #131 

SSIiTVTQLLRRLHQWISSECTTPCSGSWLR 
AGCTCCCTGACAGTGACACAGCTCCTGAGAAGGCTCCACCAATGGATTAGCTCCGAGTGTACCACACCCTGTA6CGGAA6CTGGCTGAGA 

HepCla #161 

DliSDGSWSTVSSEAGTEDVVCCSMSYSWTG 

gacctcagcgatggctcctggtccaccgtcagctccgaggctggcacagaggatgtggtctgctgtagcatgagctatagctggaccgga 

HepCla #108 

wdqmwkclirlkpt lhgptpliiyrlgavqn 
tgggatcagatgtggaaatgcctcatcagactgaaacccacactgcatggccctacccctctgctctacagactgggagccgtccagaat 




wo 01/90197 



PCT/AUOl/00622 



136/216 

HepCla #116 

LAEQFKQKALGLLQTASRQAEVIAPAVQTN 
CTGGCTGAGCAATTCAAACAGAAAGCCCTCGGCCTCCTGCAAACCGCTAGCA6ACAGGCTGAGGTCATCGCTCCCGCTGTGCAAACCAAT 

HepCla #118 

WQKIiEVFWAKHMWNFISGIQYLAGLSTLPG 
TGGCaAAAGCTCGAGGTCTTCTGGGCCAAACACATGTGGAATTTCATTAGCGGAATCCAATACCTCGCCGGACTGTCCACCCTCCCCGGA 

HepCla #129 

LIAFASRGNHVS PTHYVPESDAAARVTAIL 
CTGATTGCCTTTGCCTCCAGGGGAAACCATGTGTCCCCCACACACTATGTGCCTGAGTCCGACGCTGCCGCTAGGGTCACCGCTATCCTC 

HepCla #19 

ATLCSALYVGDIiCGSVFIjVGQIiFTFSPRRH 
GCCACACTGTGTAGCGCTCTGTATGTGGGAGACCTCTGCGGAAGCGTCTTCCTCGTGGGACAGCTCTTCACATTCTCCCCGAGAAGGCAT 

HepCla #1&2 

SSVLCECYDAGCAWYELTPAETTVRLRAYM 
AGCTCCGTGCTCTGCGAATGCTATGACGCTGGCTGTGCCTGGTACGAACTGACACCCGCTGAGACAACCGTCAGGCTCAGGGCTTACATG 

HepCla #122 

GWVAAQLAAPGAATAFVGAGLAGAAIGSVG 
GGCTGG6TGGCTGCCCAACTG6CTGCCCCTGGC6CTGCCACAGCCTTTGTGGGAGCCGGACTGGCTGGCGCTGCCATTGGCTCCGTGGGA 

HepCla #29 

SWHINSTALNCNESLNTGWLAGriFYQHKFN 
AGCTGGCACATTAACTCCACCGCTCTGAATTGCAATGAGTCCCTGAATACCGGATGGCTCGCCGGACTGTTTTACCAACACAAATTCAAT 

HepCla #164 

NALSMSLLRHHNLVYSTTSRSACQRQKKVT 
AACGCTCTGTCCAACTCCGTGCTCAGGCATCACAATCTGGTCTACTCCACCACAAGCAGAAGCGCTTGCCAAAGGCAAAAGAAAGTGACA 

HepCla #1 

AAMSTNPKPQRKTKRNTNRRPQDVKFPGGG 
GCCGCTATGTCCACCAATCCCaAACCCCAAAGGAAAACCAAAAGGAATACCAATAGGAGACCCCAAGACGTCAAGTTTCCCGGAGGC 

HepCla #106 

SQTKQSGENFP-YIiVAYQATVCARAQAPPPS 
AGCCAAACCAAACAGTCCGGCGAAAACTTTCCCTATCTGGTCGCCTATCAGGCTACCGTCTGCGCTAGGGCTCAGGCTCCCCCTCCCTCC 

HepCla #3 6 

APTYSWGANDTDVFVLNNTRPPIiGNWFGCT 
GCCCCTACCTATAGCTGGGGCGCTAACGATACCGATGTGTTTGTGCTCAACAATACCAGACCCCCTCTGGGAAACTGGTTCGGATGCACA 

HepCla #15 6 

VPPPRKKRTVVIiTESTLSTALAELATKSFG 
GTGCCTCCCCCTAGGAAAAAGAGAACCGTCGTGCTCACCGAAAGCACACTGTCCACCGCTCTGGCTGAGCTCGCCACAAAGTCCTTCGGA 

HepCla #165 

STTSRSACQRQKKVTFDRLQVLDSHYQDVL 
AGCACAACCTCCAGGTCCGCCTGTCAGAGACAGAAAAAGGTCACCTTTGACAGACTGCAAGTGCTCGACTCCCACTATCAGGATGTGCTC 

HepCla #90 

DQAETAGARLVVLATATPPGSVTVPHPNIE 
GACCAAGCCGAAACCGCTGGCGCTAGGCTCGTGGTCCTGGCTACCGCTACCCCTCCCGGAAGCGTCACCGTCCCCCATCCCAATATCGAA 

HepCla #141 

FHYVTGMTTDNIiKCPCQVPSPEFFTEIiDGV 
TTCCATTACGTCACCGGAATGACAACCGATAACCTCAAGTGTCCCTGTCAGGTCCCCTCCCCCGAATTCTTTACCGAACTGGATG6CGTC 

HepCla #198 

LKIiTPIAAAGRLDIiSGWFTAGYSGGDIYHS 
CTGAAACTGACACCCATTGCCGCTGCCGGAAGGCTCGACCTCAGCGGATGGTTTACCGCTGGCTATAGCGGAGGCGATATCTATCACTCC 

HepCla #117 

ASRQAEVIAPAVQTNWQKLEVFWAKHMWNF 
GCCTCCAGGCAAGCCGAAGTGATTGCCCCTGCCGTCCAGACAAACTGGCAGAAACTGGAAGTGTTTTGGGCTAAGCATATGTGGAACTTT 

HepCla #181 

CRASGVLTTSCGNTLTCYIKARAACRAAGL 
TGCAGAGCCTCCGGCGTCCTGACAACCTCCTGCGGAAACACACTGACATGCTATATCAAAGCCAGAGCCGCTTGCAGAGCCGCTGGCCTC 
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HepCla #166 

FDRLQVLDSHYQDVLKEVKAAASKVKANLL 
TTCGATAGGCTCCAGGTCCTGGATAGCCATTACCay^GAGGTGCTGAAAGAGGTCAAGGCTGCCGCTAGCAAAGTGAAA^ 

HepCla #180 

GPLTNSRGENCGYRRCRASGVLTTSCGNTL 
GGCCCTCTGACAAACTCCAGGGGAGAGAATTGCGGATACAGAAGGTGTAGGGCTAGCGGAGT6CTCACCACAAGCTGTGGCAATACGCTC 

HepCla #136 

IMHTRCHCGAEITGHVKNGTMRIVGPRTCR 
ATCATGCACACAAGGTGTCACTGTGGCGCTGAGATTACCGGACACGTCAAGAATGGCACAATGAGAATCGTCGGCCCTAGGACATGCAGA 

HepCla #144 

EVSFRVGLHEYPVGSQLPCEPEPDVAVT,TS 
GAGGTCAGCTTTAGGGTCGGCCTCCACGAATACCCTGTGGGAAGCCAACTGCCTTGCGAACCCGAACCCGATGTGGCTGTGCTCACCTCC 

HepCla #167 

KEVKAAASKVKANLLSVEEACSLTPPHSAK 
AAGGAAGTGAAAGCCGCTGCCTCCAAGGTCAAGGCTAACCTCCTGTCC6TGGAAGAGGCTTGCTCCCTGACACCCCCTCACTCCGCCAAA 

HepCla #59 

GRDAVILIiMCVVHPTLVFDITKLIiLAVFGP 
GGCAGAGACGCTGTGATTCTGCTCATGTGTGTGGTCCACCCTACCCTCGTGTTTGACATTACCAAACTGCTCCTGGCT6TGTTTGGCCCT 

HepCla #146 

MLTDPSHITAEAAGRRLARGSPPSMASSSA 
ATGCTCACCGATCCCTCCCACATTACCGCTGAGGCTGCCGGAAGGAGACTGGCTAGGGGAAGCCCTCCCTCCATGGCTAGCTCCAGCGCT 

HepCla #78 

SPRPI SyiiKGSSGGPIiLCPAGHAVGIFRAA 
AGCCCTAGGCCTATCTCCTACCTCAAGGGAAGCTCCGGCGGACCCCTCCTGTGTCCCGCTGGCCATGCCGTCGGCATTTTCAGAGCC6CT 

HepCla #32 

DPDQGWGPISYAlSrGSGPDQRPYCWHYPPKP 
GACTTTGACCAA66CTGGGGCCCTATCTCCTACGCTAACGGAAGCGGACCCGATCAGAGACCCTATTGCTGGCACTATCCCCCTAAGCCT 

HepCla #128 

RHVGPGEGAVQWMNRLIAPASRGNHVSPTH 
AGGCATGTGGGACCCGGAGAGGGAGCCGTCCAGTGGATGAATAGGCTCATCGCTTTCGCTAGCAGAGGCAATCACGTCAGCCCTACCCAT 

HepCla #50 

CLWMMLIiI SQAEAAIiENLVIIiNAASLAGTH 
TGCCTCTGGATGATGCTCCTGATTAGCCAAGCCGAAGCCGCTCTGGAAAACCTC6TGATTCTGAATGCCGCTAGCCTCGCCGGAACCCAT 

HepCla #114 

IIPDREVIiYREFD EMEECSQHLPYIEQGMM 
ATCATTCCCGATAGGGAAGTGCTCTACAGAGAGTTTGACGAAATGGAAGAGTGTAGCCAACACCTCCCCTATATCGAACAGGGAATGATG 

HepCla #47 

LIHLHQNIVDVQyiiYGVGSSIASWAI KWEY 
CTGATTCACCTCCACCAAAACATTGTGGATGTGCAATACCTCTACGGAGTGGGAAGCTCCATCGCTAGCTGGGCCATTAAGTGGGAGTAT 

HepCla #200 

VSHARPRWFWFCIiIiLLAAGVGIYLIiPNRAA 
GTGTCCCACGCTAGGCCTAGGTGGTTCTGGTTCTGTCTGCTCCTGCTCGCCGCTGGCGTCGGCATTTACCTCCTGCCTAACAGAGCCGCT 

HepCla #85 

AATIiGFGAYMSKAHGIDPNIRTGVRTXTTG 
GCCGCTACCCTCGGCTTTGGCGCTTACATGAGCAAAGCCCATGGCATTGACCCTAACATTAGGACAGGCGTCAGGACAATCACAACCGGA 

HepCla #62 

RVQGIiLRICALARKMIGGHYVQMAIIKLGA 
AGGGTCCAGGGACTGCTCAGGATTTGCGCTCTGGCTAGGAAAATGATTGGCGGACACTATGTGCAAATGGCTATCATTAAGCTCGGCGCT 

HepCla #153 

RRFAQALPVWARPDYNPPLVETWKKPDYEP 
AGGA6ATTCGCTCAGGCTCTGCCTGTGTGGGCCAGACCCGATTACAATCCCCCTCTGGTCGAGACATGGAAAAAGCCTGACTATGAGCCT 

HepCla #72 

TAAQTFLATCINGVCWTVYHGAGTRTIASP 
ACCGCTGCCCAAACCTTTCTGGCTACCT6TATCAATGGCGTCTGCTGGACCGTCTACCATGGCGCTGGCACAAGGACAATCGCTAGCCCT 

HepCla #65 
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WAHNGLRDLAVAVEPVVPSQMETKLITWGA 
TGGGCTCACAATGGCCTCAGGGATCTGGCTGTGGCTGTGGAACCCGTCGTGTTTAGCCAAATGGAAACCAAACTGATTACCTGGGGCGCT 

HepCla #74 

KGPVIQMYTNVDQDLVGWPAPQGSRSLTPC 
AAGGGACCCGTCATCCAAATGTATACCAAT6TGGATCAGGATCTGGTCGGCTGGCCCGCTCCCCAAGGCTCCAGGTCCCTGACACCCTGT 

HepCla #151 

KVVILDSFDPLVAEEDBREISVPAEILRKS 
AAGGTCGTGATTCT6GATAGCTTTGACCCTCTGGTCGCCGAAGAGGATGAGAGAGAGATTAGCGTCCCCGCTGAGATTCTGAGAAAGTCC 

HepCla #64 

LTGTYVYNHLTPLRDWAHNGLRDIiAVAVEP 
CT6ACAGGCACATACGTCTACAATCACCTCACCCCTCTGAGAGACTGGGCCCATAACGGACTGAGAGACCTCGCCGTC6CCGTCGAGCCT 

HepCla #80 

VCTRGVAKAVDFXPVENLETTMRSPVFTDN 
GTGTGTACCAGAGGCGTCGCCAAAGCCGTCGACTTTATCCCTGTGGAAAACCTCGAGACAACCATGAGGTCCCCCGTCTTCACAGACAAT 

HepCla #95 

ALGINAVAYYRGLDVSVIPTSGDVVVVATD 
GCCCTCGGCATTAACGCTGTGGCTTACTATAGGGGACTGGATGTGTCCGTGATTCCCACAAGCGGA6ACGTCGTGGTCGTG6CTACC6AT 

HepCla #111 

MSADLEVVTSTWVLVGGVLAALAAYCIiSTG 
ATGTCCGCCGATCTGGAAGTGGTCACCTCCACCTGGGTGCTCGTGGGAGGCGTCCTGGCTGCCCTCGCCGCTTACTGTCT6TCCACCGGA 

HepCla #97 

ALMTGYTGDFDSVIDCNTCVTQTVDFSLDP 
GCCCTCATGACAGGCTATACCGGAGACTTTGACTCCGTGATTGACTGTAACACATGCGTCACCCAAACCGTCGACTTTAGCCTCGACCCT 

HepCla #2 

NTNRRPQDVKFPGGGQIVGGVYIiLPRRGPR 
AACACAAACAGAAGGCCTCAGGATGTGAAATTCCCTGGCGGAGGCCAAATCGTCGGCGGAGTGTATCTGCTCCCCAGAAGGGGACCCAGA 

HepCla #11 

RAIiAHGVRVLEDGVNYATGNL PGCS FSI FL 
AGGGCTCT6GCTCACGGAGTGAGAGTGCTCGAGGATGGCGTCAACTATGCCACAGGCAATCTGCCTGGCTGTAGCTTTAGCATTTTCCTC 

HepCla #169 

SKFGYGAKDVRCHARKAVAHINSVWKDLIiE 
AGCAAATTCGGATACGGAGCCAAAGACGTCAGGTGTCACGCTAGGAAAGCCGTCGCCCATATCAATAGCGTCTGGAAAGACCTCCTGGAA 

HepCla #28 

TPGAKQNIQLINTNGSWHINSTALNCNESIi 
ACCCCTGGCGCTAAGCAAAACATTCAGCTCATCAATACCAATGGCTCCTGGCATATCAATAGCACAGCCCTCAACTGTAACGAAAGCCTC 

HepCla #30 

NTGWLAGLFYQHKFNSSGCPERLAS CRRLT 
AACACAGGCTGGCTGGCTGGCCTCTTCTATCAGCATAAGTTTAACTCCAGCGGATGCCCTGAGAGACTGGCTAGCTGTA6GAGACTGACA 

HepCla #49 

VVIiLFLLLADARVCSCIiWMMLLISQAEAAIi 
GTGGTCCTGCTCTTCCTCCTGCTCGCCGATGCCAGAGTGTGTAGCTGTCTGTGGATGATGCTGCTCATCTCCCAGGCTGAGGCTGCCCTC 

HepCla #192 

DCEIY6ACYSIEPLDLPPIIQRLHGLSAPS 
GACTGTGAGATTTACGGAGCCTGTTACTCCATCGAACCCCTCGACCTCCCCCCTATCATTCAGAGACTGCATGGCCTCA6CGCTTTCTCC 

HepCla #73 

WTVYHGAGTRTIASPKGPVIQMYTNVDQDIi 
TGGACAGTGTATCACGGAGCCGGAACCA6AACCATTGCCTCCCCCAAAGGCCCTGTGATTCAGATGTACACAAACGTCGACCAAGACCTC 

HepCla #101 

YRFVAPGERPSGMFDSSVLCECYDAGCAWY 
TACAGATTCGTCGCCCCTGGC6AAAGGCCTAGCGGAATGTTTGACTCCAGC6TCCTGTGTGAGTGTTACGATGCCGGATGCGCTTGGTAT 

HepCla #45 

RSELS PLIiLSTTQWQVIiPCS FTTLPALSTG 
AGGTCCGAGCTCAGCCCTCTGCTCCTGTCCACCACACAGTGGCAGGTCCTGCCTTGCTCCTTCACAACCCTCCCCGCTCTGTCCACCGGA 

HepCla #195 

LRKIiGVPPLRAWRHRARSVRARLLARGGRA 
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CTGAGAAAGCTCGGCGTCCCCCCTCTGAGAGCCTGGAGGCATAGGGCTAGGTCCGTGAGAGCCAGACTGCTCGCCAGAGGCGGAAGGGCT 
HepCla #121 

SPriTTSQTI.I,PNiriGGWVAAQLAAPGAATA 
AGCCCTCTGACAACCTCCCAGACACTGCTCTTCAATATCCTCGGCGGATGGGTCGCCGCTCAGCTCGdCGCTCCCGGAGCCGCTACCGCT 

HepCla #61 

LVJILQASLIiKVPYFVRVQGIiLRICAIiARKM 
CTGTGGATCCTCCAGGCTAGCCTCCTGAAAGTGCCTTACTTT6TGAGAGTGCAAGGCCTCCTGAGAATCTGTGCCCTCGCCAGAAAGATG 

HepCla #137 

VKNGTMRIVGPRTCRNMWSGTFPINAYTTG 
GTGAAAAACGGAACCATGAGGATTGTGGGACCCAGAACCTGTAGGAATATGTGGAGCGGAACCTTTCCCATTAACGCTTACACAACCGGA 

HepCla #92 

EVALSTTGEIPPYGKAI PLEVIKGGRHLIF 
GAGGTC6CCCTCAGCACAACCGGAGAGATTCCCTTTTACGGAAAGGCTATCCCTCTGGAAGTGATTAAG6GAGGCAGACACCTCATCTTT 

HepCla #188 

LTRDPTTPIiARAAWETARHTPVNSWLGNII 
CTGACAAGGGATCCCACAACCCCTCTGGCTAGGGCTGCCTGGGA6ACAGCCAGACACACACCCGTCAACTCCTGGCTCGGCAATATCATT 

HepCla #140 

RVSAEEYVEXRRVGDFHyVTGMTTDNL KCP 
A6GGTCAGCGCTGAGGAATACGTCGAGATTAGGAGAGTGGGAGACTTTCACTATGTGACAGGCATGACCACAGACAATCTGAAATGCCCT 

HepCla #155 

PVVHGCPLPPPRSPPVPPPRKKRTVVLTES 
CCCGTCGTGCATGGCTGTCCCCTCCCCCCTCCCAGAAGCCCTCCCGTCCCCCCTCCCAGAAAGAAAAGGACAGTGGTCCTGACAGAGTCC 

HepCla #157 

TLSTAIiAEIiATKSFGSSSTSGITGDNTTTS 
ACCCTCAGCACAGCCCTCGCCGAACTGGCTACCAAAAGCTTTGGCTCCAGCTCCACCTCC6GCATTACCG6AGACAATACCACAACCTCC 

HepCla #135 

VSCQRGYKGVWRGDGIMHTRCHCGAEITGH 
GTGTCCTGCCAAAGGGGATACAAAGGCGTCTGGAGAGGCGATGGCATTATGCATACCAGATGCCATTGCGGAGCCGAAATCACAGGCCAT 

HepCla #20 

VFLVGQLFTFSPRRHWTTQGCKTCSIYPGHI 
GTGTTTCTGGTCGGCCAACTGTTTACCTTTAGCCCTAGGAGACACTGGACCACACAGGGATGCAATTGCTCCATCTATCCCGGACACATT 

HepCla #123 

FVGAGLAGAAIGSVGIjGKVIiVDILAGYGAG 
TTCGTCGGCGCTGGCCTCGCCGGAGCCGCTATCGGAAGCGTCGGCCTCGGCAAAGTGCTCGTGGATATCCTCGCCGGATACGGAGCCGGA 

HepCla #133 

DIWDWI CEVIiSDFKTWLKAKLMPQLPGIPF 
GACATTTGGGATTGGATTTGCGAAGTGCTCAGCGATTTCAAAACCTGGCTGAAAGCCAAACTGATGCCCCAACTGCCTGGCATTCCCTTT 

HepCla #15 

NSSIVYEAADAILHTPGCVPCVREGNASRC 
AACTCCAGCATTGTGTATGAGGCTGCCGATGCCATTCTGCATACCCCTGGCTGTGTGCCTTGCGTCAGGGAAGGCAATGCCTCCAGGTGT 

HepCla #31 

SSGCPERLASCRRLTDFDQGWGPISYANGS 
AGCTCCGGCTGTCCCGAAAGGCTCGCCTCCTGCAGAAGGCTCACCGATTTCGATCAGGGATGGGQACCCATTAGCTATGCCAATGGCTCC 

HepCla #178 

RTEEAIYQCCDLDPQARVAIKSLTERLYVG 
AGGACAGAGGAAGCCATTTACCAATGCTGTGACCTCGACCCTCAGGCTAGGGTCGCCATTAAGTCCCTGACAGAGAGACTGTATGTGGGA 

HepCla #69 

VSKGWRLliAPITAYAQQTRGXiliGCIITSLT 
GTGTCCAAGGGATGGAGACTGCTCGCCCCTATCACAGCCTATGCCCAACAGACAAGGGGACTGCTCGGCTGTATCATTACCTCCCTGACA 

HepCla #191 

FFSVLIARDQLEQALDCEIYGACYSIEPLD 
TTCTTTAGCGTCCTGATTGCCAGAGACCAACTGGAACAGGCTCTGGATTGCGAAATCTATGGCGCTTGCTATAGCATTGAGCCTCTGGAT 

HepCla #142 

CQVPSPEFFTELDGVRLHRFAPPCKPLLRE 
TGCCAAGTGCCTAGCCCTGAGTTTTTCACAGAGCTCGACGGAGTGAGACTGCATAGGTTTGCCCCTCCCTGTAAGCCTCTGCTCAGGGAA 
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HepCla #182 

TCYIKARAACRAAGLQD CTMLVCGDDLVV I 
ACCT6TTACATTAAGGCTAGGGCTGCCTGTAGGGCT6CCGGACTGCAAGACTGTACCATGCTGGTCTGCGGAGACGATCTGGTCGTGATT 

HepCla #86 

IDPNIRTGVRTITTGSP ITYSTYGKFLADG 
ATCGATCCCAATATCAGAACCGGAGTGAGAACCATTACCACAGGCTCCCCCATTACCTATAGCACATACGGAAAGTTTCTGGCTGACGGA 

HepCla #44 

CNWTRG ERCDLEDRDRS ELSPLLLSTTQWQ 
TGCAATTGGACAAGGG6AGAGAGATGCGATCTGGAAGACAGAGACAGAA6CGAACTGTCCCCCCTCCTGCTCAGCACAACCCAATGGCAA 

HepCla #22 

TGHRMAWDMMMNWS PTAALVMAQLLRI PQA 
ACCGGACACAGAATGGCTTGGGATAT6ATGATGAATTGGTCCCCCACA6CCGCTCTGGTCATGGCTCAGCTCCTGAGAATCCCTCAGGCT 

HepCla #127 

PGAIiVVGVVCAAILRRHVGPGEGAVQWMNR 
CCCGGAGCCCTC6TGGTCGGCGTCGT6TGTGCCGCTATCCTCAGGAGACACGTCGGCCCTGGCGAAGGCGCTGTGCAATGGATGAACAGA 

HepCla #149 

HDSPDAELIEANLLWRQEMGGNITRVESEN 
CACGATAGCCCTGACGCTGA6CTCATCGAAGCCAATCTGCTCTGGAGACAGGAAATGGGAGGCAATATCACAAG6GTCGAGTCCGAGAAT 

HepCla #105 

BGVFTG riTHIDAHFIjSQTKQSGENFPYLVA 
GAGGGAGTGTTTACCGGACTGACACACATTGACGCTCACTTTCTGTCCCAGACAAAGCAAAGCGGAGAGAATTTCCCTTACCTCGTGGCT 

HepCla #5 

RGRRQP X PKARRPEGRTWAQPGYPWPLYGW 
AGGGGAAGGAGACAGCCTATCCCTAAGGCTAGGAGACCCGAAGGCAGAACCTGGGCGCAACCCGGATACCCTTGGCCTCTGTATGGCAAT 

HepCla #173 

LIVFPD LGVRVCEKMALYDVVSKLPLAVMG 
CTGATTGTGTTTCCCGATCTGGGAGTGAGAGTGTGTGAGAAAATGGCTCTGTATGACGTCGTGTCCAAGCTCCCCCTCGCCGTCATGGGA 

HepCla #12 

YATGNL PGCSFSIPLLALLSCLTVPASAYQ 
TACGCTACCGGAAACCTCCCCGGATGCTCCTTCTCCATCTTTCTGCTCGCCCTCCTGTCCTGCCTCACCGTCCCCGCTAGCGCTTACCAA 

HepCla #124 

LGKVLVDILAGYGAGVAGAIiVAFKlMSGEV 
CTGGGAAAGGTCCTGGTCGACATTCTGGCTGGCTATGGCGCTGGCGTCGCCGGAGCCCTCGTGGCTTTCAAAATCATGAGCGGAGA6GTC 

HepCla #160 

SYSSMPPLEGEPGDPDIiSDGSWSTVSSEAG 
AGCTATAGCTCCATGCCTCCCCTCGAGGGAGAGCCTGGCGATCCCGATCTGTCCGACGGAAGCTGGAGCACAGTGTCCAGCGAAGCCGGA 

HepCla #15 0 

RQEMGGNITRVESENKVVILDSFDPLVAEE 
AGGCAAGAGATGGGCGGAAACATTACCAGAGTGGAAAGCGAAAACAAAGTGGTCATCCTCGACTCCTTCGATCCCCTCGTGGCTGAGGAA 

HepCla #75 

VGWPAP QGSRSIiTPCTCGSSDIiYIjVTRHAD 
GTGGGAT6GCCTGCCCCTCAGG6AAGCAGAAGCCTCACCCCTTGCACATGCGGAAGCTCCGACCTCTACCTCGTGACAAGGCATGCCGAT 

HepCla #88 

GCSGGAYDIIICDECHSTDATSILGIGTVL 
GGCTGTAGCGGAGGCGCTTACGATATCATTATCTGTGACGAAT6CCATAGCACAGACGCTACCTCCATCCTCGGCATTGGCACAGTGCTC 

HepCla #99 

TFTIETTTLPQDAVSRTQRRGRTGRGKPGX 
ACCTTTACCATTGAGACAACCACACTGCCTCAGGATGCCGTCAGCAGAACCCAAAGGAGAGGCAGAACCGGAAGGGGAAAGCCTGGCATT 

HepCla #40 

DC FRKH PEATYSRCGSGPWITPRCIiVDYPY 
GACTGTTTCAGAAAGCATCCCGAAGCCACATACTCCAGGTGTGGCTCCG6CCCTTGGATTACCCCTAGGTGTCTGGTCGACTATCCCTAT 

HepCla #2 01 

LAAGVGIYLIiPNRAA 
CTGGCTGCCGGAGTGGGAATCTATCTGCTCCCCAATAGGGCTGCC 
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HepCla #163 

ALVTPCAAEBQKLPlNALSNSLLRHHNIiVy 
GCCCTCGTGACACCCTGTGCCGCTGAGGAACAGAAACTGCCTATCAATGCCCTCAGCAATAGCCTCCTGAGACACCATAACCTCGTGTAT 

HepCla #132 

ISSECTTPCSGSWLRDIWDWICEVLSDFKT 
ATCTCCAGCGAATGCACAACCCCTTGCTCCGGCTCCTGGCTCAGGGATATCTGGGACTGGATCTGTGAGGTCCTGTCCGACTTTAAGACA 

HepCla #134 

WIiKAKIiMPQLPGI PFVSCQRGYKGVWRGDG 
TGGCTCAAGGCTAAGCTCATGCCTCAGCTCCCCGGAATCCCTTTCGTCAGCTGTCAGAGAGGCTATAAGGGAGTGTGGAGGGGAGACGGA 

HepCla #41 

SGPWITPRCLVDYPYRLWHYPCTINYTI FK 
AGCGGACCCTGGATCaCACCCAGATGCCTCGTGGATTACCCTTACAGACTGTGGCACTATCCCTGTACCATTAACTATACC^ 

Artificial Protein: 



VIPVRRRGDSRGSIjIjSPRPISYLKGSSGGPARRGRBIIiIjGPADGMVSKGWRLIjAPITAYARIiHRFAPPCKPLL^ 
KrjITWGADTAACGDIINGLPVSLLCPAGHAVGIFRAAVCTRGVAKAVDFIPVCWIVGRIVLSGKPAIIPDREVIiYREFDEMPCT^ 

VSAEEYVEIRRVGDALYDWSKLPLAVMGSSYGFQYSPGQRVEPISWCLWWLQYFLTRVEAQLHVWVPPLN^ 

CFAWYLIiPPIXQRIiHGIjSAFSLHSYSPGEimVAACNPPLVETWKKPDYEPPVVHGCPLPPPRSPPGVGSSIASWAIKWEYWLnFm 

NTRPPLGl^FGCaTWMNSTGFTKVCGAPPFTEAMTRYSAPPGDPPQPEYDLELITSCSSWPLLLLLLALPQRAYALDTEVAASCGGW 

ITSLT6RDKMQVEGEVQIVSSSPPAVPQSFQVAHraHAPTGSGKSTKVPAANTPGriPVCQDHLEFWEGVFTGLTHIDAHFLVI.LL 

AGRTTSGLVSIiIiEVTIiTHPVTKYIMTCMSADLEWTSTWLWGLMALTLSPYYKRYISWCLl^ 

LDLSXAYFS^TVGmAK^VVIJIJLFAGVDAETHVTRIARGSPPSMASSSASQLSAPSLK^ 

EPEPDVAVIjTSMDTDPSHITAEAAGRDSVTPIDTTIMAKlSrEVFCVQPEKGGRKPARYAAQGYiCVLVLNPSVAATLGPGAyM 

DCPNSSrVYEAADAILHTSSYGFQYSPGQRVEPLVQAWKSKKTPMGFSDTAACta^IINGIiPVSARRGREIIjIX^ 

AEIiIEANIiLWPArASLMAFTAAVTSPriTTSQTriLFNXLGIiVQAWKSKKTPMGFSYDTRCFDST^ 

DYMPAPTLWARMIIiMTHFFSVrjIARDQIiEQALSVIPTSGDVV\n/ATDAljMTGYTGDFDSVIDaHSKI^ 

TTIiPALSTGIilHIiHQNrVDVQYriYKGRWVPGAWAIiYGMWPIiLnLIiL^ 

AAICGKYLFNWAVRTKKAVAHINSWKDLIjEDSVTPIDTTIMAECNEPTPSPVWGTTDRSGAPTO 

VATRDGKLQDGTMriVCGDDLWICESAGVQEDAASLRAVAGALVAFKIMSGEVPSTEDLWLIiPAILSYDTRCFDSTVTESDIRTEEA 

EIiTPAETTWIiRAYMNTPGLPVCQDHLEFWPQPEYDLELITSCSSWSVAHDGAGKa^VYYLiGKVIDTLTCGFADL^ 

GGRHLIFCHSKKKCDELAAKLVGGVIiAAriAAYCIjSTGCWIVGRIVLSGKPACESAGVQEDAASLRAFT 

SHARPRWFWFCIililiSSSTSGITGDNTTrSSEPAPSGCPPDSDAERTQRRGRTGRGKPGIYRFVAPGERPSGMFDVRiynirVGGVEHRLEAACK^ 

DLEDRDEAQLHVWVPPLNVRGGRDAVIIiLMCVVHPTLGVRATRKTSERSQPRGRRQPXPK^ 

PAPSGCPPDSDAESYSSMPPLEGEPGDPIGGHYVQMAIIKIiGAIiTGTYVYimiiTPIUitDPSTBDriVNIjIliPA 

GVLAGIAYPSMVGWWAKVIiVEGCGWAQWLriSPRGSRPSWGPTDPRRRSRNWTTQGCNCSIYPGHITG 

QLRRHIDLLVGSRLWHYPCTINYTIPKVRMYVGGVEHRIiEAAVPCVQPEKGGRKPARLXVFPDriGVRVCEKMMG^ 

DGWGGNAGRTTSGIiVSLLTPGAKQNIQLINTNGIALLSCLTVPASAYQVRNSTGLYHVTNDCPGRDra^QVEGBVQXVSTAAQTP 

LRRHIDLIiVGSATLCSAIiYVGDIiCGSHAPTGSGKSTKVPAAYAAQGYiCVLVLMPSWTWAQPGYPWPLYGNEGCGmGWLLSPRGST 

WTGALVTPCAAEEQKZjPIALDTEVAASCGGWLVGLMALTLSPYYKJiYWI^STGFTKVCGAPPCVXGGAGm 

GAKDWCHARISGIQYIiAGLSTLPGNPAIASLMAFTAAVTQIVGGVYLLPRRGPRLGVRATRKTSERSQPLHSYSPGEINRV 

HRTARHTPWSWLGNIIMFAPTIiWARMIIiMTHENliETTMRSPVFTDNSSPPAVPQSFQVAHIATPPGSVTVPHPNXEEVALSTTGE 

KXiriXiAVFGPXjWIIiQASLriKVPYFVTAALVMAQIiLRIPQAILDMIAGAHWGVM 

NBWLTHPVTKYIMTCARVAIKSIjTERLYVGGPLTNSRGENCGYRRCVIGGAGITOTLHCPTDCFRKHPEATYSRC 
GDSRGSLLN^MSGTPPIWAYTTGPCTPLPAPNyTFALWHSTDATSILGIGTVrJDQAETAG2U^LWrlATYVPE 

RPSWGPTDPRRRSHNLGKVIDTLTCGPADLGPDQRPYCWHYPPKPCGIVPAKSVCGPVYCEBCSQHIiPYIEQGMMIiAEQPKQKAIjGIjL^ 

AQAPPPSWDQMWKCLIRLKPTLCGXVPAKSVCGPVYCFTPSPVWGTTDRSGSSIjTVTQLLRRLHQWISSECTTPCSGSWLRDLSDGSWSTVSSBAGT 

EDWCCSMSYSWTGWDQMWKCLIRIiKPTLHGPTPLLYRLGAVQNLAEQPKQKALGLLQTASRQAEVIAPAVQTNWQKLEVFWAKM 

LSTIiPGLI AFASRGMHVS PTHYVPESDAAARVTAI IATLCS ALYVGDLCG SVFLVGQLPTF S PRRHS s vlcecydagcawyelt paettvrlraymgw 

VAAQIiAAPGAATAFVGAGLAGAAIGSVGSWHINSTALNCNESIiNTGWLAGLFYQHKFNNALSNSIiLRHHW 

RKTKRNTNRRPQDVKFPGGGSQTKQSGENFPYLVAYQATVCARAQAPPPSAPTYSWGAlsnDTDVFVIiNNTRPPLGlWFGCTVPPPRKI^ 

TAIiABIjATKSFGSTTSRSACQRQKK:\^TFDRLQVLDSHYQDVLDQAETAGARIiWIiATATPPGSVTVPHPMIEFHYVTGMTTDErLKCPC^^ 

LDGVLKLTPIAAAGRIiDIiSGWFTAGYSGGDIYHSASRQAEVIAPAVQTNWQKLEVFWAKHMWNFCRASGVLTTSCGNTLTC 

QVLDSHYQDVLKEVKAAASKVKANIjLGPLTITSRGENCGYRRCRASGVIiTTSCGNTLIMHTROT 

VGSQLPCEPEPDVAVIiTSKEVKAAASKTKANLLSVEEACSLTPPHSAKGRDAVILIiMCVVHPTIiVFDITKIjIiIi^^ 

SPPSMASSSASPRPISYlJCGSSGGPLLCPAGHAVGIPRAADPDQGWGPISYANGSGPDQRPYCWHYPPKPRHVGPGEGAVQW^lNRLI;^ 

THCLWMMLLiSQAEAALENXiVILNAASLAGTHIXPDREVLYREFDEMEECSQHLPYIEQGMMLIHLHQNIVDVQ^ 

RWFWFCLLXiliAAGVGIYLIjPKRAAAATLGPGAYMSKAHGIDPlIIRTGVRTlTTGRVQGLIjRICAIJ^KMIGGHYVQm 

YNPPLVETWKKPDYEPTAAQTFLATCIWGVCWTVYHGAGTRTIASPWAHNGLRDIAVAVEPWFSQMETKLITWGAKGPVIQM^^ 

GSRSLTPCKWILDSFDPIiVAEEDEREISVPAEILRKSLTGTYVYNHLTPLRDWAHMGLRDIAVAVEPVCTRGVAI^ 

ALG INAVAYYRGLDVS\a PTSGDVVWATDMS ADLE WTS T WIiVGG VLAAIiAAYCL S TGALMTG YTGDFD S VIDCN^ 

VKFPGGGQI VGGVYIjIiPRRGPRRAIiAHGVRVLEDGWYATGNIiPGCS FS I FLSKFGYGAKD WCHARKAVAHINS 

MJlNSTALNCNESLNTGWLAGLFYQHKFNSSGCPERLASCRRLTWLLFLLXiADARVCSCLWMMLLI SQAEAALDCEIYGACYS I EPIjDLP P X X QRIiH 

GLSAPSWTVYHGAGTRTIASPKGPVIQMYTNVDQDIiYRFVAPGERPSGMFDSSVLCECYDAGCAWYRSELSPLIiljSTTQWQVLPCSFTTLPALSTGL 

KLGVPPLRAWEHRARSVRARLIiARGGRASPLTTSQTLIiFNILGGWVAAQIiAAPGAATAL^ 

PRTCRNMWSGTFPINAYTTGEVALSTTGEI PPYGKAI PLEVIKGGRHliI FLTRDPTTPLARAAWETARHTPVNSWLGNI IRVSAEEYVEIRRVGDFHY 
VTGMTTDNLKCPPVVHGCPLPPPRSPPVPPPRKKRTVVLTESTLSTAIiAEXiATKSFGSSSTSGITGDNTTTSVSCQRGYKGVWRGDGXMHTRCHCGAE 
ITGHVFLVGQLFTFSPRRHWTTQGCNCSIYPGHIPVGAGIAGAAIGSVGLGKVLVDXLAGYGAGDIWDWICEVIiSDPKTWLKAKLMPQXtPG 
VYEAADAILHTPGCVPCVREGNASRCS SGCPERLASCRRLTDFDQGWGP I S YANGSRTEEAI YQCCDLDPQARVAI KS LTERLYVGVSKGWRLIiAP IT 
AYAQQTRGLLGCIITSLTFFSVLIARDQLEQAIiDCEIYGACYSIEPIJDCQVPSPEPFTELDGVRIJIRPAPPCKPIiLRETCYIKARAACRA^ 
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LVCGDDLWIIDPNIRTGWTITTGSPITYSTYGKPIiADGCNWTRGERCT>LEDRDRSELSPLriLSTTQWQTC^ 

QAPGALWGWOiAILRRHVGPGEGAVQWMNRHDSPDAELIEANLLWRQEMGGNITI^^ 

IPKARRPEGRTWAQPGYPWPIiYGHLIVFPDLGVRVCEKMALYDWSKIjPlAVPflGYATGNLPGCSFSIFLIi^ 

GVAGALVAFKIMSGEVSYSSMPPLEGEPGDPDLSDGSWSWSSEAGRQEMGGNITRVESENK\MriIiDSPDPLVAEEVGWPAPQGSRSLTPCTCGSSDL 
YLVTRHADGCSGGAYDIIICDECHSTDATSIIHCSIGTVLTPTIETTTLPQDAVSRTQRRGRTGRGKPGIDCFRKHPEATYSRCGSGPWITPRCLVDYP 
LAAGVGIYLLPNRAAALVTPCAAEEQKLPINALSNSLLRHHNLVYISSECTTPCSGSWLRDIWDWICEVLSDF^^^ 
GVWRGDGSGPWITPRCLVDYPYRLWHYPCTINYTIFK 

Artificial DNA: 



GTGATTCCCGTCAGGAGAAGGGGAGACTCCAGGGGAAGCCTCCTGTCCCCCAGACCCATTAGCTATCTGAAAGGCTCCAGCGGAGGCCCTGCCAGAAG 
GGGAAGGGAAATCCTCCTGGGACCCGCTGACGGAATGGTCAGCAAAGGCTGGAGGCTCCTGGCTCCCATTACCGCTTACGCTAGGCTCCACAGATTCG 
CTCCCCCTTGCAAACCCCTCCTGAGAGAGGAAGTGTCCTTCAGAGTGGGACTGCATGAGTATCCCGTCGGCTCCGTGGTCTTCTCCCAGATGGAGACA 
AAGCTCATCACATGGGGAGCCGATACCGCTGCCTGTGGCGATATCATTAACGGACTGCCTGTGTCCCTGCTCTGCCCTGCCGGACACGCTGTGGGAAT 
CTTTAGGGCTGCCGTCTGCACAAGGGGAGTGGCTAAGGCTGTGGATTTCATTCCCGTCTGCGTCGTGATTGTGGGAAGGATTGTGCTCAGCGGAAAGC 
CTGCCATTATCCCTGACAGAGAGGTCCTGTATAGGGAATTCGATGAGATGCCCTGTACCCCTCTGCCTGCCCCTAACTATACCTTTGCCCTCTGGAGA 
GTGTCCGCCGAAGAGTATGTGGAAATCAGAAGGGTCGGCGATGCCCTCTACGATGTGGTCAGCAAACTGCCTCTGGCTGTGATGGGCTCCAGCTATGG 
CTTTCAGTATAGCCCTGGCCAAAGGGTCGAGTTTATCTCCTGGTGTCTGTGGTGGCTCCAGTATTTCCTCACCAGAGTGGAAGCCCAACTGCATGTGT 
GG6TGCCTCCCCTCAACGTCAGGGGAGAGAATCTGGTCATCCTCAACGCTGCCTCCCTGGCTGGCACACACGGACTGGTCAGCTTTCTGGTCTTCTTT 
TGCTOTGCCTOSTACCTCCTGCCTCCCATTATCCAAAGGCTCCACGGACTGTCCGCCT^ 

GGCTGCCTGTAACCCTCCCCTCGTGGAAACCTGGAAGAAACCCGATTACGAACCCCCTGTGGTCCACGGATGCCCTCTGCCTCCCCCTAGGTCCCCCC 

CTGGCGTCGGCTCCAGCATTGCCTCCTGGGCTATOU^TGGGAATACGTCGTGCTCCTGTTTCTGCTCCTGGCTGACGCTAGGGTCTGCTCCCTGAAT 

AACACAAGGCCTCCCCTCGGCAATTGGTTTGGCTGTACCTGGATGAATAGCACAGGCTTTACCAAAGTGTGTGGCGCTCCCCCTTTCA^ 

GACAAGGTATAGCGCTCCCCCTGGCGATCCCCCTCA6CCTGAGTATGACCTCGAGCTCATCACAAGCTGTAGCTCCTGGCCTCTGCTCCTGCTCCTGC 

TCGCCCTCCCCCaAAGGGCTTACGCTCTGGATACCGAAGTGGCTGCCTCCTGCGGAGGCGTCGTGCTCCAGCAAACCAGAGGCCTCCTGGGATGC^ 

ATCACAAGCCTCACCGGAAGGGATAAGAATCAGGTCGAGGGAGAGGTCCAGATTGTGTCCAGCTCCCCCCCTGCCGTCCCCCaAAGCTTTCTl 

CCATCTGCATGCCCCTACCGGAAGCGGAAAGTCCACCAAAGTGCCTGCCGCTAACACACCCGGACTGCCTGTGTGTCAG6ATCACCTCGAGTTTTGGG 

AAGGCGTCTTCACAGGCCTCACCCATATCGATGCCCATTTCCTCGTGCTCCTGCTCTTCGCTGGCGTCGACGCTGAGACACACGTCACCGGAGGCAAT 

GCCGGAAGGACAACCTCCGGCCTCGTGTCCCTGCTCGAGGTCACCCTCACCCATCCCGTCACCAAATACATTATGACATGCATGAGCGCTGACCTCGA 

GGTCGTGACAAGCACATGGGTCCTGGTCGTGGGACTGATGGCCCTCACCCTCAGCCCTTACTATAAGAGATACATTAGCTGGTGCCTCTGGTGGCTGC 

AATACTTTCTGACAAGGGTCGCCATTTGCGGAAAGTATCTGTTTAACTGGGCCGTCAGGACAAAGCTCAAGCTCACCCCTATCGCTGCCGCTGGCAGA 

CTGGATCTGTCCATCGCTTACTTTAGCATGGTGGGAAACTGGGCCAAAGTGCTCGTGGTCCTGCTCCTGTTTGCCGGAGTGGATGCCGAAACCCATGT 

GACAAGGCTCGCCAGAGGCTCCCCCCCTAGCATGGCCTCCAGCTCCGCCTCCCAGCTCAGCGCTCCCTCCCTGAAAGCCACATGCACAGCCAATGGCC 

TCGTGTCCTTCCTCGTGTTTTTCTGTTTCGCTTGGTATCTGAAAGGCAGATGGGTCCCCGGAGCCGTCTACGCTCTGTATGGCATGCAGCTCCCCTGT 

GAGCCTGAGCCTGACGTCGCCGTCCTGACAAGCATGCTGACAGACCCTAGCCATATCACAGCCGAAGCCGCTGGCAGAGACTCCGTGACACCCATTGA 

CACaACCATrATGGCTAAGAATGAGGTCTTCTGTGTGCAACCCGAA?^GGGAGGCAGAAAGCCTGCCAGATACGCTGCCCAAGGCTATAAGG^ 

TCCrXGJ^TCCCrCCGTGGCTGCCACACTGGGATTCGGAGCCTATATGTCCAAGGCTCACGGAGTGAGA^ 

GACTGTCCCaATAGCTCmTCGTCTACGAAGCCGCTGACGCTATCCTCCACACAAGCTCCTACGGATTCCAATACTCCCCCGGACAGAGAGTGGAA 

CCTCGTGCT^GCCTGGAAGTCCAAGAAAACCCCTATGGGATTCTCCGACACAGCCGCTTGCGGAGACATTATCAATGGCCTCCCCGTCAG 

GAGGCAGAGAGATTCTGCTCGGCCCTGCCGATGGCATGAGCCAACTGTCCGCCCCTAGCCTCAAGGCTACCTGTACCGCTAACC^ 

6CCGAACTGATTGAGGCTAACCTCCTGTGGAACCCTGCCATTGCCTCCCT6ATGGCCTTTACCGCTGCCGTCACCTCCCCCCTCACCACAAGCCAAAC 

CCTCCTGTTTAACaTTCTGGGACTGGTCCAGGCTTGGAAAAGCAAAAAGACACCCATGGGCTTTAGCTATGACACAAGGTGTTTCGATAGavC^ 

CAGAGTCCGACATTGACGAAAGGGAAATCTCCGTGCCTGCCGAAATCCTCAGGAAAAGCAGAAGGTTTGCCCAAGCCCTCCCCGTCTGGGCTAGGCCT 

GACTATATGTTTGCCCCTACCCTCTGGGCTAGGATGATCCTCATGACACACTTTTTCTCCGTGCTCATCGCTAGGGATCaGCTCGAGCAAGCCCTC^ 

CGTCATCCCTACCTCCGGCGATGTGGTCGTGGTCGCCACAGACGCTCTGATGACCGGATACACAGGCGATTTCGATAGCGTCATCGATTGCCATAGCA 

AAAAGAAATGCGATGAGCTCGCCGCTAAGCTCGTGGCTCTGGGAATCAATGCCGTCGCCTATTACAGAGGCCTCGACGTCGTGCTCCCCTGTAGCTTT 

ACCACACTGCCTGCCCTCAGCACAGGCCTCATCCATCTGCATCAGAATATCGTCGACGTCCAGTATCTGTATAAGGGAAGGTGGGTGCCTGGCGCTGT 

GTATGCCCTCTACGGAATGTGGCCCCTCCTGCTCCTGCTCCTGGCTCTGCCTCAGAGAGCCTATAGCCCTATCACATACTCCACCTATGGCAAATTCC 

TCGCCGATGGCGGATGCTCCGGCGGAGCCTATGACATTATCATTTGCGATGAGTGTGCCAGAAGCGTCAGGGCTAGGCTCCTGGCTAGGGGAGGCAGA 

GCCGCTATCTGTGGCAAATACCTCTTCAATTGGGCTGTGAGAACC2^AAAAGGCTGTGGCTCAGATTAACTCCGTGTGGAAGGATCTGCTCGAGGATAG 

CGTCACCCCTATCGATACCACAATCATGGCCAAAAACGAATTCACACCCTCCCCCGTCGTGGTCGGCACAACCGATAGGTCCGGCGCTCCCACATACT 

CCTGGGGAGCCA?ITGACACAGACGTCTTCGTCCCCGGATGCGTCCCCTGTGTGAGAGAGGGAAAC6CTAGCAGATGCTGGGTGGCTATGACACCCACA 

GTGGCTACCAGAGACGGAAAGCTCCAGGATTGCACAATGCTCGTGTGTGGCGATGACCTCGTGGTCATCTGTGAGTCCGCCGGAGTGCAAGAGGATGC 

CGCTAGCCTCAGGGCTGTGGCTGGCGCTCTGGTCGCCTTTAAGATTATGTCCGGCGAAGTGCCTAGCACAGAGGATCTGGTCAACCTCCTGCCTGCCA 

TTCTGTCCTACGATACCAGATGCTTTGACTCCACCGTCACCGAAAGCGATATCAGAACCGAAGAGGCTATCTATCAGTGTTGCGATCTGGATCCCCAA 

GAGCTCACCCCTGCCGAAACCACAGTGAGACTGAGAGCCTATATGAATACCCCTGGCCTCCCCGTCTGCCAAGACCATCTGGAATTCTGGCCCCAACC 

CGAATACGATCTGGAACTGATTAGCTCCTGCTCCAGCAATGTGTCCGTGGCTCT^CGATGGCGCTGGCAAAAGGGTCTACTATCTGGGA^ 

ATACCCTCACCTGT6GCTTT6CCGATCTGATGGGCTATATCCCTCTGGTCGGCGCTCCCCTCGGCGGAGCCGCTGCCATTCCCCTCGAGGTCATCAAA 

GGCGGAAGGCATCTGATTTTCTGTCACTCCAAGAAAAAGT6TQACGAACTGGCTGCCAAACTGGTCGGCGGA6TGCTCGCCGCTCTGGCTGCCTATTG 

CCTCAGCAGAGGCTGTGTGGTCATCGTCGGCAGAATOSTCCTGTCCGGCAAACCCGCTTGCGAAAGCGCTGGCGTCGAGGAAGACGCTGCC^ 

GAGCCTTTACCGAAGCCATGACCAGATACTCCGCCCCTCCCGGAGACCCTGGCTGGTTCACAGCCGGATACTCCGGCGGAGACATTTACCATAGCGTC 

AGCCATGCCAGACCCAGATGGTTTTGGTTTTGCCTCCTGCTCAGCTCCAGCACAAGCGGAATCACAGGCGATAACACAACCACAAGCTCCGAGCCTGC 

CCCTAGCGGATGCCCTCCCGATAGCGATGCCGAAAGGACACAGAGAAGGGGAAGGACAGGCAGAGGCAAACCCGGAATCTATAGGTTTGTGGCTCCC6 

GAGAGAGACCCTCCGGCATGTTCGATGTGAGAATGTATGTGGGAGGCGTCGAGCATAGGCTCGAGGCTGCCTGTAACTGGACCAGAGGCGAAAGGT6T 

GACCTCGAGGATAGGGATGAGGCTCAGCTCCACGTCTGGGTCCCCCCTCTGAATGTGAGAGGCGGAAGGGATGCCGTCATCCTCCTGATGTGCGTCGT 

GCATCCCACACTGGGAGTGAGAGCCACAAGGAAAACCTCCGAGAGAAGCCAACCCAGAGGCAGAAGGCAACCCATTCCCAAAGCCAGAAGGCCTGAGG 

GAAACGTCAGCGTCGCCCATGACGGAGCCGGAAAGAGAGTGTATTACCTCACCAGAGACCCTACCACACCCCTCGCCAGAGCCGCTTGGGAAAGCGAA 

CCCGCTCCCTCCGGCTGTCCCCCTGACTCCGACGCTGAGTCCTACTCCAGCATGCCCCGTCTGGAAGGCGAACCCGGAGACCCTATCGGAGGCCATTA 

CGTCCAGATGGCCATTATCAAACTGGGAGCCCTCACCGGAACCTATGTGTATAACCATCTGACACCCCTCAGGGATCCCTCCACCGAAGACCTCGTGA 

ATCTGCTCCCCGCTATCCTCAGCCCTGGCGCTCTGGTCGTGGGAGTGGTCTGCGCTGCCATTCTGAGAATCCTCGACATGATCGCTGGCGCTCACTGG 

GGCGTCCTGGCTGGCATTGCCTATTTCTCCATGGTCGGCAATTGGGCTAAGGTCCTGGTCGAGGGATGCGGATGGGCTGGCTGGCTGCTCAGCCCTAG 

GGGAAGCAGACCCTCCTGGGGACCCACAGACCCTAGGAGAAGGTCCAGGAATTGGACAACCCAAGGCTGTAACTGTAGCATTTACCCTGGCCATATCA 

CaGGCCATAGGATGGCCTGGGACATGATGATGAACTGGAGCCCTTGGGTCGCCATGACCCCTACCGTCGCCACAAGGGATGGCAAACTGCCTGCm 
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CAGCTCAGGAGACACATTGACCTCCTGGTCGGCTCCAGGCTCTGGCATTACCCTTGCACAATCAATTACACAATCTTTAAGGTCAGGATGTACGTCGG 

CGGAGTGGAACACAGACTGGAAGCCGCTGTGTTTTGCGTCCAGCCTGAGAAAGGCGGAAGGAAACCCGCTAGGCTCATCGTCTTCCCTGACCTCGGCG 

TCAGGGTCTGCGAAAAGATGATGGGATACATTCCCCTCGTGGGAGCCCCTCTG6GAGGCGCTGCCAGAGCCCTCGCCCATGGCGTCAGGGTCCTGGAA 

GACGGAGTGAATGGCGGAAACGCTGGCAGAACCACS^GCGGACTGGTCAGCCTCCTGACACCCGGAGCCAAACaVGAATATCC;^ 

CGGACTGGCTCTGCTCAGCTGTCTGACAGTGCCTGCCTGCGCCTATCAGGTCAGGAATAGCACAGGCCTGTACCATGTGACAAACGATTGCCC 

GAGACAAAAACCAAGTGGAAGGCGAAGTGCAAATCGTCA6CAC3VGCCGCTa^GAa\T^ 

CTGAGAAGGCATATCGATCTGCTCGTGGGAAGCGCTACCCTCTGCTCCGCCCTCTACGTCGGCGATCTGTGTGGCTCCeACGCTCCCACAGGCTCCGG 
CAAAAGCACAAAGGTCCCCGCTGCCTATGCCGCTCa.GGGATACAAAGTGCTCGT6CTCAACCCTAGCGTC7iGGAm 

GGCCCCTCTACGGAAACGAAGGCTGTGGCTGGGCCGGATGGCTCCT6TCCCCCAGAGGCTCCACCGAAGACGTCGTGTGTTGCTCCATGTCCTACTCC 
TGGACa^GGCGCrCTGGTGACCCCTTGCGCTGCCGAAGAGCAAAAGCTCCCCATTGCCCTCGACS^CAGAGGTCGCCGCTAGOT 

GGTCGGCCTCATGGCTCTGACACTGTCCCCCTATTACAAAAGGTATTGGATGAACTCCACCGGATTCACAAAGGTCTGCGGAGCCCCTCCCTGTGTGA 

TTGGCGGAGCCGGAAACAATACCCTCCACTGTCCCACAAGCGTCGAGGAAGCCTGTAGCCTCACGCCTCCGCATAGCGCTAAGTCCAAGTTTGGCTAT 

GGCGCTAAGGATGTGAGATGCCATGCCAGAATCTCCGGCATTCAGTATCTGGCTGGCCTCAGCACACTGCCTGGCAATCCCGCTATCGCTAGCCTCAT 

GGCTTTCACAGCCGCTGTGACACAGATTGTGGGAGGCGTCTACCTCCTGCCTAGGAGAGGCCCTAGGCTCGGCGTCAGGGCTACCAGAAAGACAAGCG 

AAAGGTCCCAGCCTCTGCATAGCTATAGCCCTGGCGAAATCAATAGGGTCGCCGCTTGCCTCAGGAAACTGGGAGTGCCTCCCCTCAGGGCTTGGAGA 

CACAGAACCGCTAGGCATACCCCTGTGAATAGCTGGCTGGGAAACATTATCATGTTCGCTCCCACACTGTGGGCCAGAATGATTCTGATGACCCATGA 

GAATCTGGAAACCACAATGAGAAGCCCTGTGTTTACCGATAACTCCAGCCCTCCCGCTGTGCCTCAGTCCTTCCAAGTGGCTCACCTCGCCACACCCC 

CTGGCTCCGTGACAGTGCCTCACCCTAACATTGAGGAAGTGGCTCTGTCCACCACAGGCGAAATCCCTTTCTATGGCAAACTGGTCTTCGATATCACA 

AAGCTCCTGCTCGCCGTCTTCGGACCCCTCTGGATTCTGCAAGCCTCCCTGCTCAAGGTCCCCTATTTCGTCACCGCTGCCCTCGTGATGGCCCAACT 

GCTCAGGATTCCCCAAGCCATTCTGGATATGATTGCCGGAGCCCATTGGGGAGTGCTCGCCGGATGCAATACCTGTGTGACACAGACAGTGGATTTCT 

CCCTGGATCCCACATTCACAATCGAAACCACAACCCTCCCCCAAGACGCTGTGTCCCACGGACCCACACCCCTCCTGTATAGGCTCGGCGCTGTGCAA 

AACGAAGTGACACTGACACACCCTGTGACAAAGTATATCATGACCTGTGCCAGAGTGGCTATCAAAAGCCTCACCGAAAGGCTCTACGTCGGCGGA^ 

CCTCACCAATAGCAGAG6CGAAAACTGTGGCTATAGGAGATGCGTCATCGGAGGCGCTGGCAATAACACACTGCATTGCCCTACCGATTGCTTTAGGA 

AACACCCTGAGGCTACCTATAGCAGATGCGGAACCTGTGGCTCCAGCGATCTGTATCTGGTCACCAGACACGCTGACGTCATCCCTGTGAGAAGGAGA 

GGCGATAGCAGAGGCTCCCTGCTCAACATGTGGTCCGGO^Cy^TTCCCTATCAATGCCTATACCACAGGCCCTTGCA 

CACATTCGCTCTGTGGCACTCCACCGATGCCACTU^GCATTCTGGGAATCGGAACCGTCCTGGATCAGGCTGAGAmGCC^ 

TCGCCAO^TACGTCCCCGAAAGCGATGCCGCTGCCAGAGTGACAGCCATTCTGTCCAGCCTCACCGTCACCCAACTGCTCAGGAGACTGCaLTCAG 
AGGCCTAGCTGGGGCCCTACCGATCCCaiGAAGGAGAAGCAGAAACCTCGGCAAAGTGATTGACACACTGAC^ 

CCaAAGGCCTTACTGTTGGCATTACCCTCCCAAACCCTGTGGCATTGTGCCTGCCAAAAGCGTCTGCGGACCCGTCTACTGTGAGGAATGCTCCCAGC 

ATCTGCCTTACATTGAGCAAGGCATGATGCTCGCCGAACAGTTTAAGCAAAAGGCTCTGGGACTGCTCCAGACATACCAAGCCACAGTGTGTGCCAGA 

GCCCAAGCCCCTCCCCCTAGCTGGGACCAAATGTGGAA6TGTCTGATTAGGCTCAAGCCTACCCTCTGCGGAATCGTCCCCGCTAAGTCCGTGTGTGG 

CCCTGTGTATTGCTTTACCCCTAGCCCTGTGGTCGTGGGAACCACAGACAGAAGCGGAAGCTCCCTGACAGTGACACAGCTCCTGAGAAGGCTCCACC 

AATGGATTAGCTCCGAGTGTACCACACCCTGTAGCGGAAGCTGGCTGAGAGACCTCAGCGATGGCTCCTGGTCCACCGTCAGCTCCGAGGCTGGCACA 

GAGGATGTGGTCTGCTGTAGCATGAGCTATAGCTGGACCGGATGGGATCAGATGTGGAAATGCCTCATCAGACTGAAACCCACACTGCATGGCCCTAC 

CCCTCTGCTCTACAGACTGGGAGCCGTCCAGAATCTGGCTGAGCAATTCAAACAGAAAGCCCTCGGCCTCCTGCAAACCGCTAGCAGACAGGCTGAGG 

TCATCGCTCCCGCTGTGCAAACCAATTGGCAAAAGCTCGAGGTCTTCTGGGCCAAACACATGTGGAATTTCATTAGCGGAATCCAATACCTCGCCGGA 

CTGTCCACCCTCCCCGGACTGATTGCCTTTGCCTCCAGGGGAAACCATGTGTCCCCCACACACTATGTGCCTGAGTCCGACGCTGCCGCTAGGGTCAC 

CGCTATCCTCGCCACACTGTGTAGCGCTCTGTATGTGGGAGACCTCTGCGGAAGCGTCTTCCTCGTGGGACAGCTCTTCACATTCTCCCCCAGAAGGC 

ATAGCTCCGTGCTCTGCGAATGCTATGACGCTGGCTGTGCCTGGTACGAACTGACACCCGCTGAGACAACCGTCAGGCTCAGGGCTTACATGGGCTGG 

6TGGCTGCCCAACTGGCTGCCCCTGGCGCTGCCACAGCCTTTGTGGGAGCCGGACTGGCTGGCGCTGCCATTGGCTCCGTGGGAAGCTGGCACATTAA 

CTCCa^CCGCTCTGAATTGCAATGAGTCCCTGAATACCGGATGGCTCGCCGGACrrGTTTTACCAACACSiAATT 

TmGGCATCACAATCTGGTCTACTCCACCACAAGCAGAAGCGCTTGCCAAAGGa^AAAGAAAGTGACAGCCGCTATGT 

AGGAAAACCAAAAGGAATACay^TAGGAGACCCCAAGACGTCAAGTTTCCCGGAGGCGGAAGCCAAACCAAACAGTCCGGCGAAM 

G6TCGCCTATCAGGCTACCGTCTGCGCTAGGGCTCAGGCTCCCCCTCCCTCC6CCCCTACCTATAGCTGGGGCGCTAACGATACCGATGTGTTTGT6C 

TCAACAATACCAGACCCCCTCTGGGAAACTGGTTCGGATGCACAGTGCCTCCCCCTAGGAAAAAGAGAACCGTCGTGCTC^ 

ACCGCTCTGGCTGAGCTCGCCACaAAGTCCTTCGGAAGCTVCAACCTCCAGGTCCGCCTGTaiGAGACAGAAAAAGGTaiCOT 

GCTCGACTCCCACTATCAGGATGTGCTCGACCAAGCCGAAACCGCTGGCGCTAGGCTCGTGGTCCTGGCTACCGCTACCCCTCCCGGAAGCGTCACCG 

TCCCCCATCCCAATATCGAATTCCATTACGTCACCGGAATGACAACCGATAACCTCAAGTGTCCCTGTCAGGTCCCCTCCCCCGAATTCTTTACCGAA 

CTGGATGGCGTCCTGAAACTGACACCCATTGCCGCTGCCGGAAGGCTCGACCTCAGCGGATGGTTTACCGCTGGCTATAGCGGAGGCGATATCTATCA 

CTCCGCCTCCAGGCAAGCCGAAGTGATTGCCCCTGCCGTCCAGACAAACTGGCAGAAACTGGAAGTGTTTTGGGCTAAGCATATGTGGAACTTTTGCA 

GAGCCTCCGGCGTCCTGACAACCTCCTGCGGAAACACACTGACATGCTATATCAAAGCCAGAGCCGCTTGCAGAGCCGCTGGCCTCTTCGATAGGCTC 

CAGGTCCTGGATAGCCATTACCAAGACGTCCTGAAAGAGGTCAAGGCTGCCGCTAGCAAAGTGAAAGCCAATCTGCTCGGCCCTCTGACAAACTCCAG 

GGGAGAGAATTGCGGATACAGAAGGTGTAGGGCTAGCGGAGTGCTCAGCACAAGCTGTGGCAATACCCTCATCATGCACACAAGGTGTCACTGTGGCG 

CTGAGATTACCGGACACGTCAAGAATGGCACAATGAGAATCGTCGGCCCTAGGACATGCAGAGAGGTCAGCTTTAGGGTCGGCCTCCACGAATACCCT 

GTGGGAAGCCAACTGCCTTGCGAACCCGAACCCGATGTGGCTGTGCTCACCTCCAAGGAAGTGAAAGCCGCTGCCTCCAAGGTCAAGGCTAACCTCCT 

GTCCGTGGAAGAGGCTTGCTCCCTGACACCCCCTCACTCCGCCAAAGGCAGAGACGCTGTGATTCTGCTCATGTGTGTGGTCCACCCTACCCTCGTGT 

TTGACATTACCAAACTGCTCCTGGCTGTGTTTGGCCCTATGCTCACCGATCCCTCCCACATTACCGCTGAGGCTGCCGGAAGGAGACTGGCTAGGGGA 

AGCCCTCCCTCCATGGCTAGCTCCAGCGCTAGCCCTAGGCCTATCTCCTACCTa^AGGGAAGCTCCGGCGGACCCCTCCTGTGTCCCGCTGGCCATGC 

CGTCGGCATTTTCAGAGCCGCTGACTTTGACC2\AGGCTGGGGCCCTATCTCCTACGCTAACGGAAGCGGACCCGATCAGAGACCCTATTGCTGGCACT 

ATCCCCCTAAGCCTAGGCATGTGGGACCCGGAGAGGGAGCCGTCCAGTGGATGAATAGGCTCATCGCTTTCGCTAGCAGAGGCAATCACGTCA^ 

ACCCATTGCCTCTGGATGATGCTCCTGATTAGCCAAGCCGAAGCCGCTCTGGAAAACCTCGTGATTCTGAATGCCGCTAGCCTCGCCGGAACCCATAT 

CATTCCCGATAGGGAAGTGCTCTACAGAGAGTTTGACGAAATGGAAGAGTGTAGCCAACaVCCTCCCCTATATCGAACAGGGAATGATGCT^ 

TCCACCAAAACATTGTGGATGTGCAATACCTCTACGGAGTGGGAAGCTCCATCGCTAGCTGGGCCyiTTAAGTGGGAGTATGTC 

AGGTGGTTCTGGTTCTGTCTGCTCCTGCTCGCCGCTG6CGTCGGCATTTACCTCCTGCCTAACAGAGCCGCTGCCGCTACCCTCGGCTTT6GCGCTTA 
CATGAGC^iAAGCCCATGGCATTGACCCTAACATTAGGACAGGCGTCAGGACAATCACAACCGGAAGGGTCCAGGGACTGCTC^ 

CTAGGAAAATGATTGGCGGACACTATGTGCAAATGGCTATCATTAAGCTCGGCGCTAGGAGATTCGCTCAGGCTCTGCCTGTGTGGGCCAGACCCGAT 
TACAATCCCCCTCTGGTCGAGACATGGAAAAAGCCTGACTATGAGCCTACCGCTGCCCAAACCTTTCTGGCTACCTGTATCAATGGCGTCTGCTGGAC 
CGTCTACCATGGCGCTGGCACAAGGACAATCGCTAGCCCTTGGGCTCACAATGGCCTCAGGGATCTGGCTGTGGCTGTGGAACCCGTCGTGTTTAGCC 
AAATGGAAACCAAACTGATTACCTGGGGCGCTAAGGGACCCGTCATCCAAATGTATACCAATGTGGATCAGGATCTGGTCGGCTGGCCCGCTCCCCAA 
GGCTCCAGGTCCCTGACACCCTGTAAGGTCGTGATTCTGGATAGCTTTGACCCTCTGGTCGCCGAAGAGGATGAGAGAGAGATTAGCGTCCCCGCTGA 
GATTCTGAGAAAGTCCCTGACAGGCACATACGTCTACAATCACCTCACCCCTCTGAGAGACTGGGCCCATAACGGACTGAGAGACCTCGCCGTCGCCG 
TCGAGCCTGTGTGTACCAGAGGCGTCGCCAAAGCCGTCGACTTTATCCCTGTGGAAAACCTCGAGACAACCATGAGGTCCCCCGTCTTCACAGACAAT 
GCCCTCG6CATTAACGCTGTGGCTTACTATAGGGGACTGGATGTGTCCGTGATTCCCACAAGCGGAGAC6TCGTGGTCGT6GCTACCGATATGTCCGC 
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CGATCTGGAAGTGGTCACCTCCACCTGGGTGCTCGTGGGAGGCGTCCTGGCTGCCCTCGCCGCTTACTGTCTGTCCACCGGAGCCCTCATGACAG6CT 

ATACCGGAGACTTTGACTCGGTGATTGACTGTAACACATGCGTCACCCAAACCGTCGACTTTAGCCTCGACCCTAACACAAAOVGAAGGCCTCAGGAT 

GTGAAATTCCCTGGCGGAGGCCAAATCGTCGGCGGAGTGTATCTGCTCCCCAGAAGGGGACCCAGAAGGGCTCTGGCTCACGGAGTGAGAGTGCTCGA 

GGATGGCGTCAACTATGCCACAGGCAATCTGCCTGGCTGTAGCTTTAGCATTTTCCTCAGCAAATTCGGATACG6AGCCAAAGACGTCAGGTGTCACG' 

CTAGGAAAGCCGTCGCCCATATCAATAGCGTCTGGAAAGACCTCCTGGAAACCCCTGGCGCTAAGCAAAACATTCAGCTCATCAATACCAATGGCTCC 

TGGCATATCAATAGCACAGCCCTCAACTGTAACGAAAGCCTCAACACAGGCTGGCTGGCTGGCCTCTTCTATCAGCATAAGTTTAACTCCAGCGGATG 

CCCTGAGAGACTGGCTAGCTGTAGGAGACTGACAGTGGTCCTGCTCTTCCTCCTGCTCGCCGATGCCAGAGTGTGTAGCTGTCTGTGGATGATGCTGC 

TCATCTCCCAGGCTGAGGCTGCCCTCGACTGTGAGATTTACGGAGCCTGTTACTCCATCGAACCCCTCGACCTCCCCCCTATCATTCAGAGACTGCAT 

GGCCTCAGCGCTTTCTCCTGGACAGTGTATCACGGAGCCGGAACCAGAACCATTGCCTCCCCCAAAGGCCCTGTGATTCAGATGTACACAAACGTCGA 

CCAAGACCTCTACAGATTCGTCGCCCCTGGCGAAAGGCCTAGCGGAATGTTTGACTCCAGCGTCCTGTGTGAGTGTTACGATGCCGGATGCGCTTGGT 

ATAGGTCCGAGCTCAGCCCTCTGCTCCTGTCCACCACACAGTGGCAGGTCCTGCCTTGCTCCTTCACAACCCTCCCCGCTCTGTCCACCGGACTGAGA 

AAGCTCGGCGTCCCCCCTCTGAGAGCCTGGAGGCATAGGGCTAGGTCCGTGAGAGCCAGACTGCTCGCCAGAGGCGGAAGGGCTAGCCCTCTGACAAC 

CTCCCAGACACTGCTCTTCAATATCCTCGGCGGATGGGTCGCCGCTCAGCTCGCCGCTCCCGGAGCCGCTACCGCTCTGTGGATCCTCCAGGCTAGCC 

TCCTGAAA6TGCCTTACTTTGTGAGAGTGCAAGGCCTCCTGAGAATCTGTGCCCTCGCCAGAAAGATGGTGAAAAACGGAACCATGAGGATTGTGGGA 

CCCAGAACCTGTAGGAATATGTGGAGCGGAACCTTTCCCiATTAACGCTTACACAACCGGAGAGGTCGCCCTCAGCACAACCGGAGAGATTCCCTTTTA 

CGGAAAGGCTATCCCTCTGGAAGTGATTAAGGGAGGO^GACACCrCATCTTTCTGACAAGGGATCCCAC^ 

CAGCCaGACACTkCACCCGTCAACTCCTGGCTCGGaVATATCATTAGGGTaVGCGCTGAGGAATACGTCGAGATTAGGA 

GTGACAGGCATGACCACAGACAATCTGAAATGCCCTCCCGTCGTGCATGGCTGTCCCCTCCCCCCTCCCAGAAGCCCTCCCGTCCCCCCTCCCAGAAA 

GAAAAGGACAGTGGTCCTGACAGAGTCCACCCTCAGCACAGCCCTCGCCGAACTGGCTACCAAAAGCTTTGGCTCCyiGCTCCACCTCCGGCA 

GAGACAATACCACAACCTCCGTGTCCTGCCAAAGGGGATAO^AAGGCGTCTGGAGAGGCGATGGCATTATGCATACCAGATGCCATTGCGGA 

ATCACAGGCCATGTGTTTCTGGTCGGCCAACTGTTTACCTTTAGCCCTAGGAGACACTGGACCACACAGGGATGa^TTGCTCCATOT 

CATTTTCGTCGGCGCTGGCCTCGCCG6AGCCGCTATCGGAAGCGTCGGCCTCGGCAAAGTGCTCGTGGATATCCTCGCCGGATACGGA6CCGGAGACA 

TTTGGGATTGGATTTGCGAAGTGCTCAGCGATTTCAAAACCTGGCTGAAAGCCAAACTGATGCCCCAACTGCCTGGCATTCCCTTTAACTCCAGCATT 

GTGTATGAGGCTGCCGATGCCATTCTGCATACCCCTGGCTGTGTGCCTTGCGTCAGGGAAGGCAATGCCTCCAGGTGTAGCTCCGGCTGTCCCGAAAG 

GCTCGCCTCCTGCAGAAGGCTCACCGATTTCGATCAGGGATGGGGACCCATTAGCTATGCCAATGGCTCCAGGACAGAGGAAGCCATTTACCAATGCT 

GTGACCTCGACCCTCAGGCTAGGGTCGCCATTAAGTCCCTGACAGAGAGACTGTATGTGGGAGTGTCCAAGGGATGGAGACTGCTCGCCCCTATCACA 

GCCTATGCCCAACAGACAAGGGGACTGCTCGGCTGTATCATTACCTCCCTGACATTCTTTAGCGTCCTGATTGCCAGAGACCAACTGGAACAGGCTCT 

GGATTGCGAAATCTATGGCGCTTGCTATAGCATTGAGCCTCTGGATTGCCAAGTGCCTAGCCCTGAGTTTTTCACAGAGCTCGACGGAGTGAGACTGC 

ATAGGTTTGCCCCTCCCTGTAAGCCTCTGCTCAGGGAAACCTGTTACATTAAGGCTAGGGCTGCCTGTAGGGCTGCCGGACTGCAAGACTGTACCATG 

CTGGTCTGCGGAGACGATCTGGTCGTGATTATCGATCCCAATATCAGAACCGGAGTGAGAACCATTACCACAGGCTCCCCCATTACCTATAGCACATA 

aSGAAAGTTTCTGGCTGACGGATGCAATTGGACAAGGGGAGAGAGATGCGATCTGGAAGACAGAGACAGAAGCGAACTGTCCCCCCTCCTC 

CAACCCAATG6CAAACCGGACACAGAATGGCTTGGGATATGATGATGAATTGGTCCCCCACAGCCGCTCTGGTCATGGCTCAGCTCCTGAGAATCCCT 

CAGGCTCCCGGAGCCCTCGTGGTCGGCGTCGTGTGTGCCGCTATCCTCAGGA6ACACGTCGGCCCTG6CGAAGGCGCTGTGCAATGGATGAACAGACA 

CGATAGCCCTGACGCTGAGCTCATCGAAGCCAATCTGCTCTGGAGACaGGAAATGGGAGGCAATATCACa^GGG 

TTACCGGACTGACACACATTGACGCTCyiCTTTCTGTCCCAGACAAAGCAAAGCGGAGAGAATTTCCCTTACCTCGTGGCTAGGGGM 

ATCCCTAAGGCTAGGAGACCCGAAGGCAGAACCTGGGCCCAACCCGGATACCCTTGGCCTCTGTATGGCS^TCTGATTGTGTTTCCCG 

GAGAGTGTGTGAGAAAATGGCTCTGTATGACGTCGTGTCCy^AGCTCCCCCTCGCCGTCa^TGGGATACGCTACCGGAAACCTCCCCGGATGCTCCTT^ 

CCATCTTTCTGCTCGCCCTCCTGTCCTGCCTCACCGTCCCCGCTAGCGCTTACCAACTGGGAAAGGTCCTGQTCGACATTCTGGCTGGCTATGGCGCT 

GGCGTCGCCGGAGCCCTCGTGGCTTTCAAAATCATGAGCGGAGAGGTCAGCTATAGCTCCATGCCTCCCGTCGAGGGAGAGCCTGGCGATCCCGATCT 

GTCCGACGGAAGCTGGAGCACAGTGTCCAGCGAAGCCGGAAGGCAAGAGATGGGCGGAAACATTACCAGAGTGGAAAGCGAAAACAAAGTGGTCATCC 

TCGACTCCTTCGATCCCCTCGTGGCTGAGGAAGTGGGATGGCCTGCCCCTCAGGGAAGCAGAAGCCTCACCCCTTGCACATGCGGAAGCTCCGACCTC 

TACCTCGTGACAAGGCATGCCGATGGCTGTAGCGGAGGCGCTTACGATATCATTATCTGTGACGAATGCCATAGCACAGACGCTACCTCCATCCTCGG 

CATTGGCACAGTGCTCACCTTTACCATTGAGACAACCACACTGCCTCAGGATGCCGTCAGCAGAACCCAAAGGAGAGGCAGAACCGGAAGGGGAAAGC 

CTGGCATTGACTGTTTCAGAAAGCATCCCGAAGCCACATACTCCAGGTGTGGCTCCGGCCCTTGGATTACCCCTAGGTGTCTGGTCGACTATCCCTAT 

CTGGCTGCCGGAGTGGGAATCTATCTGCTCCCCAATAGGGCTGCCGCCCTCGTGACACCCTGTGCCGCTGAGGAACAGAAACTGCCTATCAATGCCCT 

CAGCAATAGCCTCCTGAGACACCATAACCTCGTGTATATCTCCAGCGAATGCACAACCCCTTGCTCCGGCTCCTGGCTCAGGGATATCTGGGACTGGA 

TCTGTGAGGTCCTGTCCGACTTTAAGACATGGCTCAAGGCTAAGCTCATGCCTCAGCTCCCCGGAATCCCTTTCGTCAGCTGTCAGAGAGGCTATAAG 

GGAGTGTGGAGGGGAGACGGAAGCGGACCCTGGATCT^CACCCaiGATGCCTCGTGGATTACCCTTACAGACTGTGGCACTATCCCTGTACCA 

TACCATTTTCAAA 



HepC Savine Cassette Sequences (A+B+C) with specific restriction sites removed which can be joined 
to generate a single expressible open reading frame that encodes the hepc Savine protein above 



Cassette A 

ggcggatccccaccATGGTGATTCCCGTCAGGAGAAGGGGAGACTCCAGGGGAAGCCTCCTGTCCCCCAGACCCATTAGC 
TATCTGAAAGGCTCCAGCGGAGGCCCTGCCAGAAGGGGAAGGGAAATCCTCCTGGGACCCGCTGACGGAATGGTCAGCAA 
AGGCTGGAGGCTCCTGGCTCCCATTACCGCTTACGCTAGGCTCCACAGATTCGCTCCCCCTTGCAAACCCCTCCTGAGAG 
AGGAAGTGTCCTTCAGAGTGGGACTGCATGAGTATCCCGTCGGCTCCGTGGTCTTCTCCCAGATGGAGACAAAGCTCATC 
ACATGGGGAGCCGATACCGCTGCCTGTGGCGATATCATTAACGGACTGCCTGTGTCCCTGCTCTGCCCTGCCGGACACGC 
TGTGGGAATCTTTAGGGCTGCCGTCTGCACAAGGGGAGTGGCTAAGGCTGTGGATTTCATTCCCGTCTGCGTCGTGATTG 
TGGGAAGGATTGTGCTCAGCGGAAAGCCTGCCATTATCCCTGACAGAGAGGTCCTGTATAGGGAgTTtGATGAGATGCCC 
TGTACCCCTCTGCCTGCCCCTAACTATACCTTTGCCCTCTGGAGAGTGTCCGCCGAAGAGTATGTGGAAATCAGAAGGGT 
CGGCGATGCCCTCTACGATGTGGTCAGCAAACTGCCTCTGGCTGTGATGGGCTCCAGCTATGGCTTTCAGTATAGCCCTG 
GCCAAAGGGTCGAGTTTATCTCCTGGTGTCTGTGGTGGCTCCAGTATTTCCTCACCAGAGTGGAAGCCCAACTGCATGTG 
TGGGTGCCTCCCCTCAACGTCAG6GGAGA6AATCTGGTCATCCTCAACGCTGCCTCCCTGGCTGGCACACACGGACTGGT 
CAGCTTTCTGGTCTTCTTTTGCTTTGCCTGGTACCTCCTGCCTCCCATTATCCAAAGGCTCCACGGACTGTCCGCCTTTA 
GCCTCCACTCCTACTCCCCCGGAGAGATTAACAGAGTGGCTGCCTGTAACCCTCCCCTCGTGGAAACCTGGAAGAAACCC 
GATTACGAACCCCCTGTG6TCCACGGATGCCCTCTGCCTCCCCCTAG6TCCCCCCCTGGCGTCGGCTCCAGCATTGCCTC 
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CTGGGCTATCAAATGGGAATACGTCGTGCTCCTGTTTCTGCTCCTGGCTGACGCTAGGGTCTGCTCCCTGAATAACACAA 

GGCCTCCCCTCGGCAATTGGTTTGGCTGTACCTGGATGAATAGCACAGGCTTTACCAAAGTGTGTGGCGCTCCCCCTTTC 

ACAGAGGCTATGACAA.GGTATAGCGCTCCCCCTGGCGATCCCCCTCAGCCTGAGTATGACCTCGAGCTCATCACAAGCTG 

TAGCTCCTGGCCTCTGCTCCTGCTCCTGCTCGCCCTCCCCCAAAGGGCTTACGCTCTGGATACCGAAGTGGCTGCCTCCT 

GCGGAGGCGTCGTGCTCCAGCAAACCAGAGGCCTCCTGGGATGCATTATCACAAGCCTCACCGGAAGGGATAAGAATCAG 

GTCGAGGGAGAGGTCCAGATTGTGTCCAGCTCCCCCCCTGCCGTCCCCCAAAGCTTTCAGGTCGCCCATCTGCATGCCCC 

TACCGGAAGCGGAAAGTCCACCAAAGTGCCTGCCGCTAACACACCCGGACTGCCTGTGTGTCAGGATCACCTCGAGTTTT 

GGGAAGGCGTCTTCACAGGCCTCACCCATATCGATGCCCATTTCCTCGTGCTCCTGCTCTTCGCTGGCGTgGAtGCTGAG 

ACACACGTCACCGGAGGCAATGCCGGAAGGACAACCTCCGGCCTCGTGTCCCTGCTCGAGGTCACCCTCACCCATCCCGT 

CACCAAATACATTATGACATGCATGAGCGCTGACCTCGAGGTCGTGACAAGCACATGGGTCCTGGTCGTGGGACTGATGG 

CCCTCACCCTCAGCCCTTACTATAAGAGATACATTAGCTGGTGCCTCTGGTGGCTGCAATACTTTCTGACAAGGGTCGCC 

ATTTGCGGAAAGTATCTGTTTAACTGGGCCGTCAGGACAAAGCTCAAGCTCACCCCTATCGCTGCCGCTGGCAGACTGGA 

TCTGTCCATCGCTTACTTTAGCATGGTGGGAAACTGGGCCAAAGTGCTCGTGGTCCTGCTCCTGTTTGCCGGAGTGGATG 

CCGAAACCCATGTGACAAGGCTC6CCAGAGGCTCCCCCCCTAGCATGGCCTCCAGCTCCGCCTCCCAGCTCAGCGCTCCC 

TCCCTGAAAGCCACaiTGCACaGCCAATGGCCTCGTGTCCTTCCTCGTGTTTTTCTGTTTCGCTTGGTATCTGAAAG 

ATGGGTCCCCG6AGCCGTCTACGCTCTGTATGGCATGCAGCTCCCCTGTGAGCCTGA6CCT6ACGTCGCCGTCCTGACAA 

GCTITGCTGACAGACCCTAGCCATATCACT^GCCGAAGCCGCTGGCAGAGACTCCGTGACACCCATTGACAC^ 

GCTAAGAATGAGGTCTTCTGTGTGCAACCCGAAAAGGGAGGCAGAAAGCCTGCCAGATACGCTGCCCAAGGCTATAAGGT 

CCTGGTCCTGAATCCCTCCGTGGCTGCCACACTGGGATTCGGAGCCTATATGTCCAAGGCTCACGGAGTGAGAAACTCCA 

CCGGACTGTATCACGTCaiCCAATGACTGTCCCAATAGCTCCATCGTCTACGAAGCCGCTGACGCTATCCTCCACACAAGC 

TCCTACGGATTCCAATACTCCCCCGGACAGAGAGTGGAgTTtCTCGTGCAAGCCTGGAAGTCCAAGAAAACCCCTATGGG 

ATTCTCCGACACAGCCGCTT6CGGAGACATTATCAATGGCCTCCCCGTCAGCGCTAGGAGAGGCAGAGAGATXCTGCTCG 

GCCCTGCCGATGGCATGAGCCAACTGTCCGCCCCTAGCCTCAAGGCTACCTGTACCGCTAACCATGACTCCCCCGATGCC 

GAACTGATTGAGGCTAACCTCCTGTGGAACCCTGCCATTGCCTCCCTGATGGCCTTTACCGCTGCCGTCACCTCCCCCCT 

CACCACAAGCCAAACCCTCCTGTTTAACATTCTGGGACTGGTCCAGGCTTGGAAAAGCAAAAAGACACCCATGGGCTTTA 

GCTATGACACAAGGTGTTTCGATAGCACAGTGACAGAGTCCGACATTGACGAAAGGGAAATCTCCGTGCCTGCCGAAATC 

CTCAGGAAAAGCAGAAGGTTTGCCCAAGCCCTCCCCGTCTGGGCTAGGCCTGACTATATGTTTGCCCCTACCCTCTGGGC 

TAGGATGATCCTCATGACACACTTTTTCTCCGTGCTCATCGCTAGGGATCAGCTCGAGCAAGCCCTCAGCGTCATCCCTA 

CCTCCGGCGATGTGGTCGTGGTCGCCACAGACGCTCTGATGACCGGATACACAGGCGATTTCGATAGCGTCATCGATTGC 

CATAGCAAAAAGAAATGCGATGAGCTCGCCGCTAAGCTCGTGGCTCTGGGAATCAATGCCGTCGCCTATTACAGAGGCCT 

CGACGTCGTGCTCCCCTGTAGCTTTACCACACTGCCTGCCCTCAGCACAGGCCTCATCCATCTGCATCAGAATATCGTgG 

AtGTCCAGTATCTGTATAAGGGAAGGTGGGTGCCTGGCGCTGTGTATGCCCTCTACGGAATGTGGCCCCTCCTGCTCCTG 

CTCCTGGCTCTGCCTCAGAGAGCCTATAGCCCTATCACaVTACTCCACCTATGGCAfiATTCCTCGCCGATGGCGGATGCTC 

CGGCGGAGCCTATGACATTATCATTTGCGATGAGTGTGCCAGAAGCGTCAGGGCTAGGCTCCTGGCTAGGGGAGGCAGAG 

CCGCTATCTGTGGCaVAATACCTCTTCS^TTGGGCTGTGAGAACCAAAAaGGCTGTGGCTCACATTAACTCCGTGTGGAAG 

GATCTGCTCGAGGATAGCGTCACCCCTATCGATACCACAATCATGGCCAAAAACGAgTTtACACCCTCCCCCGTCGTGGT 

CGGCACAACCGATAGGTCCGGCGCTCCCaiCATACTCCTGGGGAGCCaATGACACAGACGTCTTCGTCCCCGGATGCGTCC 

CCTGTGTGAGAGAGGGAAAC6CTAGCAGATGCTGGGTGGCTATGACACCCACAGTGGCTACCAGAGACGGAAAGCTCCAG 

GATTGCACAATGCTCGTGTGTGGCGATGACCTCGTGGTCATCTGTGAGTCCGCCGGAGTGCAAGAGGATGCCGCTAGCCT 

CAGGGCTGTGGCTGGCGCTCTGGTCGCCTTTAAGATTATGTCC6GCGAAGTGCCTAGCACAGAGGATCTGGTCAACCTCC 

TGCCTGCCATTCTGTCCTACGATACCAGATGCTTTGACTCCACCGTCACCGAAAGCGATATCAGAACCGAAGAGGCTATC 

TATCAGTGTTGCGATCTcGAcCCCCAAGAGCTCACCCCTGCCGAAACCACAGTGAGACTGAGAGCCTATATGAATACCCC 

TGGCGTCCCCGTGTGCCAAGACCATCTGGAgTTtTGGCCCCAACCCGAATACGATCTGGAACTGATTACCTCCTGCTCCA 

GCAATGTGTCCGTGGCTCACGATGGCGCTGGCAAAAGGGTCTACTATCTGGGAAAGGTCATCGATACCCTCACCTGTGGC 

TTTGCCGATCTGATGGGCTATATCCCTCTGGTCGGCGCTCCCCTCGGCGGAGCCGCTGCCATTCCCCTCGAGGTCATCAA 

AGGCGGAAGGCATCTGATTTTCTGTCACTCCAAGAAAAAGTGTGACGAACTGGCTGCCAAACTGGTCGGCGGAGTGCTCG 

CCGCTCTGGCTGCCTATTGCCTCAGCACAGGCTGTGTGGTCATCGTCGGCAGAATCGTCCTGTCCGGCAAACCCGCTTGC 

GAAAGCGCTGGCGTCCAGGAAGACGCTGCCTCCCTGAGAGCCTTTACCGAAGCCATGACCAGATACTCCGCCCCTCCCGG 

AGACCCTGGCTGGTTCACAGCCGGATACTCCGGCGGAGACATTTACCATAGCGTCAGCCATGCCAGACCCAGATGGTTTT 

GGTTTTGCCTCCTGCTCAGCTCCAGCACAAGCGGAATCACAGGCGATAACACAACCACAAGCTCCGAGCCTGCCCCTAGC 

GGATGCCCTCCCGATAGCGATOCCGAAAGGACACAGAGAAGGGGAAGGACAGGCAGAGGCAAACCCGGAATCTATAGGTT 

TGTGGCTCCCGGAGAGAGACCCTCCGGCATGTTCGATGTGAGAATGTATGTGGGAGGCGTCGAGCATAGGCTCGAGGCT6 

CCTGTAACTGGACCAGAGGCGAAAGGTGTGACCTCGAGGATAGGGATGAGGCTCAGCTCCACGTCTGGGTCCCCCCTCT6 

AATGTGAGAGGCGGAAGGGATGCCGTCATCCTCCTGATGTGCGTCGTGCATCCCACACTGGGAGTGAGAGCCACl^GGAA 

AACCTCCGAGAGAAGCCAACCCAGAGGCAGAAGGC^CCCATTCCCa^AAGCCAGAAGGCCTGAGGGAAACGTCAG 

CCCATGACGGAGCCGGAAAGA6AGTGTATTACCTCACCA6AGACCCTACCACACCCCTCGCCAGAGCCGCTTGGGAAAGC 

GAACCCGCTCCCTCCGGCTGTCCCCCTGACTCCGACGCTGAGTCCTACTCCAGCATGCCCCCTCTGGAAGGCGAACCCG6 

AGACCCTATCGGAGGCCATTACGTCCAGATGGCCATTATCAAACTGGGAGCCCTCACCGGAACCTATGTGTATAACCATC 

TGACACCCCTCAGaGAcCCCTCCACCGAAGACCTCGTGAATCTGCTCCCCGCTATCCTCAGCCCTGGCGCTCTGGTCGTG 

GGAGTGGTCTGCGCTGCCATTCTGAGAATCCTCGACATGATCGCTGGCGCTCACTGGGGCGTCCTGGCTGGCATTGCCTA 

TTTCTCCATGGTCGGCAATTGGGCTAAGGTCGTGGTCGAGGGATGCGGATGGGCTGGCTGGCTGCTCAGCCCTAGGGGAA 

GCAGACCCTCCTGGGGACCCACAGACCCTAGGAGAAGGTCCAGGAATgtcgactgagaattcgcc 



Cassette B 



ggcggatccaccatgctcgagTGGACAACCCAAGGCTGTAACTGTAGCATTTACCCTGGCCATATCACAGGCCATAGGAT 
GGCCTGGGACATGATGATGAACTGGAGCCCTTGGGTCGCCATGACCCCTACCGTCGCCACAAGGGATGGCAAACTGCCTG 
CCACACAGCTCAGGAGACACATTGACCTCCTGGTCGGCTCCAGGCTCTGGCATTACCCTTGCACAATCAATTACACAATC 
TTTAAGGTCAGGATGTACGTCGGCGGAGTG6AACACAGACTGGAAGCCGCTGTGTTTTGCGTCCAGCCTGAGAAAGGCG6 
AAGGAAACCCGCTAGGCTCATCGTCTTCCCTGACCTCGGCGTCAGGGTCTGCGAAAAGATGAT6GGATACATTCCCCTCG 
TGGGAGCCCCTCTGGGAGGCGCTGCCAGAGCCCTCGCCCATGGCGTCAGGGTCCTGGAAGACGGAGTGAATGGCGGAAAC 
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GCTGGCAGAACCACAAGCGGACTGGTCAGCCTCCTGACACCCGGAGCCAAACAGAATATCCAACTGATTAACACAAACGG 

ACTGGCTCTGCTCAGCTGTCTGACAGTGCCTGCCTCCGCCTATCAGGTCAGGAATAGCACAGGCCTCTACCATGTGACAA 

ACGATTGCCCTGGCAGAGACAAAAACCAAGTGGAAGGCGAAGTGCAAATCGTCAGCACAGCCGCTCAGACATTCCTCGCC 

ACATGCATTAACGGAGTGTGTCCCGCTACCCAACTGAGAAGGCATATCGATCTGCTCGTGGGAAGCGCTACCCTCTGCTC 

CGCCCTCTACGTCGGCGATCTGTGTGGCTCCCACGCTCCCACAGGCTCCGGCAAAAGCACAAAGGTCCCCGCTGCCTATG 

CCGCTCAGGGATACAAAGTGCTCGTGCTCAACCCTAGCGTCAGGACATGGGCTCAGCCTGGCTATCCCTGGCCCCTCTAC 

GGAAACGAAGGCTGTGGCTGGGCCGGATGGCTCCTGTCCCCCAGAGGCTCCACCGAAGACGTCGTGTGTTGCTCCATGTC 

CTACTCCTGGACAGGCGCTCTGGTCACCCCTTGCGCTGCCGAAGAGCAAAAGCTCCCCATTGCCCTCGACACAGAGGTCG 

CCGCTAGCTGTGGCGGAGTGGTCCTGGTCGGCCTCATGGCTCTGACACTGTCCCCCTATTACAAAAGGTATTGGATGAAC 

TCCACCGGATTCACAAAGGTCTGCGGAGCCCCTCCCTGTGTGATTGGCGGAGCCGGAAACAATACCCTCCACTGTCCCAC 

AAGCGTCGAGGAAGCCTGTAGCCTCACCCCTCCCCATAGCGCTAAGTCCAAGTTTGGCTATGGCGCTAAGGATGTGAGAT 

GCCATGCCAGAATCTCCGGCATTCAGTATCTGGCTGGCCTCAGCACACTGCCTGGCAATCCCGCTATCGCTAGCCTCATG 

GCTTTCACAGCCGCTGTGACACAGATTGTGGGAGGCGTCTACCTCCTGCCTAGGAGAGGCCCTAGGCTCGGCGTCAGGGC 

TACCAGAAAGACAAGCGAAAGGTCCCAGCCTCTGCy^TAGCTATAGCCCTGGCGAAATCAATAGGGTC^ 

GGAAACTGGGA6TGCCTCCCCTCAGGGCTTGGAGACACAGAACCGCTAGGCATACCCCTGTGAATAGCTGGCTGGGAAAC 

ATTATCT^TGTTCGCTCCCACACTGTGGGCCaGAATGATTCTGATGACCCaTGAGAATCTGGAAACCACAATGAGAA 

TGTGTTTACCGATAACTCCAGCCCTCCCGCTGTGCCTCAGTCCTTCCAAGTGGCTCACCTCGCCACACCCCCTGGCTCCG 

TGACAGTGCCTCACCCTAACATTGAGGAAGTGGCTCTGTCCACCACAGGCGAAATCCCTTTCTATGGCAAACTGGTCTTC 

GATATCaCAAAGCTCCTGCTCGCCGTCTTCGGACCCCTCTGGATTCTGCAAGCCTCCCTGCTCAAGGTCCCCTATTTCGT 

CACCGCTGCCCTCGTGATGGCCCAACTGCTCAGGATTCCCCAAGCCATTCTGGATATGATTGCCGGAGCCCATTGGGGAG 

TGCTCGCCGGATGCAATACCTGTGTGACACAGACAGTGGATTTCTCCCTcGAcCCmCATTCACAATCGAAACCACM 

CTCCCCCAAGACGCTGTGTCCCACGGACCCACACCCCTCCTGTATAGGCTCGGCGCTGTGCAAAACGAAGTGACACTGAC 

ACACCCTGTGACAAAGTATATCATGACCTGTGCCAGAGTGGCTATCAAAAGCCTCACCGAAAGGCTCTACGTCGGCGGAC 

CCCTCACCAATAGCAGAGGCGAAAACTGTGGCTATAGGAGATGCGTCATCGGAGGCGCTGGCAATAACACACTGCATTGC 

CCTACCGATTGCTTTAGGAAACACCCTGAGGCTACCTATAGCAGATGCGGAACCTGTGGCTCCAGCGATCTGTATCTGGT 

CACCAGACACGCTGACGTCATCCCTGTGAGAAGGAGAGGCGATAGCAGAGGCTCCCTGCTCAACATGTGGTCCGGCACAT 

TCCCTATCAATGCCTATACCACAGGCCCTTGCACACCCCTCCCCGCTCCCAATTACACATTCGCTCTGTGGCACTCCACC 

GATGCCACAAGCATTCTGGGAATCGGAACCGTCCTGGATCAGGCTGAGACAGCCGGAGCCAGACTGGTCGTGCTCGCCAC 

ATACGTCCCCGAAAGCGATGCCGCTGCCAGAGTGACAGCCATTCTGTCCAGCCTCACCGTCACCCAACTGCTCAGGAGAC 

TGCATCAGTGGAGGCCTAGCTGGGGCCCTACCGATCCCAGAAGGAGAAGCAGAAACCTCGGCAAAGTGATTGACACACTG 

ACATGCGGATTCGCTGACCTCGGCCCTGACCAAAGGCCTTACTGTTGGCATTACCCTCCCAAACCCTGTGGCATTGTGCC 

TGCCAAAAGCGTCTGCGGACCCGTCTACTGTGAGGAATGCTCCCAGCATCTGCCTTACATTGAGCAAGGCATGATGCTCG 

CCGAACy^GTTTAAGCAAAAGGCTCTGGGACTGCTCCAGACATACCAAGCCACAGTGTGTGCCAGAGCCC^ 

CCTAGCTGGGACCay^TGTGGAAGTGTCTGATTAGGCTa^GCCTACCCTCTGCGGAATCGTCCCCGCTAAGTCCGTGTG 

TGGCCCTGTGTATTGCTTTACCCCTAGCCCTGTGGTCGTGGGAACCACAGACa^GAAGCGGAAGCTCCCTGACaiGTGAC^ 

AGCTCCTGAGAAG6CTCCai.CCAATGGATTAGCTCCGAGTGTACCACACCCTGTAGCGGAAGCTGGCTGAGAGACCTCAGC 

GATGGCTCCTGGTCCACCGTCAGCTCCGAGGCTGGCACAGAGGATGTGGTCTGCTGTAGCATGAGCTATAGCTGGACCGG 

ATGGGATCAGATGTGGAAATGCCTCATCAGACTGAAACCCACACTGCATG6CCCTACCCCTCTGCTCTACAGACTGGGAG 

CCGTCCAGAATCTGGCTGAGCAATTCAAACAGAAAGCCCTCGGCCTCCTGCAAACCGCTAGCAGACAGGCTGAGGTCATC 

GCTCCCGCTGTGCAAACCAATTGGCAAAAGCTCGAGGTCTTCTGGGCCAAACACATGTGGAATTTCATTAGCGGAATCCA 

ATACCTCGCCGGACTGTCCACCCTCCCCGGACTGATTGCCTTTGCCTCCAGGGGAAACCATGTGTCCCCCACACACTATG 

TGCCTGAGTCCGACGCTGCCGCTAGGGTCACCGCTATCCTCGCCACACTGTGTAGCGCTCTGTATGTGGGAGACCTCTGC 

GGAAGCGTCTTCCTCGTGGGACAGCTCTTCACATTCTCCCCCAGAAGGCATAGCTCCGTGCTCTGCGAATGCTATGACGC 

TGGCTGTGCCTGGTACGAACTGACACCCGCTGAGACAACCGTCAGGCTCAGGGCTTACATGGGCTGGGTGGCTGCCCAAC 

TGGCTGCCCCTGGCGCTGCCACAGCCTTTGTGGGAGCCGGACTGGCTGGCGCTGCCATTGGCTCCGTGGGAAGCTGGCAC 

ATTAACTCCACCGCTCTGAATTGCAATGAGTCCCTGAATACCGGATGGCTCGCCGGACTGTTTTACCAACACAAATTCAA 

TAACGCTCTGTCCAACTCCCTGCTCAGGCATCACAATCTGGTCTACTCCACCACAAGCAGAAGCGCTTGCCAAAGGCAAA 

AGAAAGTGACAGCCGCTATGTCCACCAATCCCAAACCCCAAAGGAAAACCAAAAGGAATACCAATAGGAGACCCCAAGAC 

GTCAAGTTTCCCGGAGGCGGAAGCCAAACCAAACAGTCCGGCGAAAACTTTCCCTATCTGGTCGCCTATCAGGCTACCGT 

CTGCGCTAGGGCTCAGGCTCCCCCTCCCTCCGCCCCTACCTATAGCTG6GGCGCTAACGATACCGATGTGTTTGTGCTCA 

ACAATACCAGACCCCCTCTGGGAAACTGGTTCGGATGCACAGTGCCTCCCCCTAGGAAAAAGAGAACCGTCGTGCTCACC 

GAAAGmCACTGTCCT^CCGCTCTGGCTGAGCTCGCCACaLAAGTCCTTCGGAAGCAa^yiLCCTCCAGGTCCG 

ACAGAAAAAGGTCACCTTTGACSVGACTGCAAGTGCTCGACTCCCACTATCAGGATGTGCTCGACCAAGCCGAAACCGCTG 

GCGCTAGGCTCGTGGTCCTGGCTACCGCTACCCCTCCCGGAAGCGTCACCGTCCCCCATCCCAATATCGAgTTtCATTAC 

GTCACCGGAATGACAACCGATAACCTCAAGTGTCCCTGTCAGGTCCCCTCCCCCGAgTTtTTTACCGAACTGGATGGCGT 

CCTGAAACTGACACCCATTGCCGCTGCCGGAAGGCTCGACCTCAGCGGATGGTTTACCGCTGGCTATAGCGGAGGCGATA 

TCTATCACTCCGCCTCCAGGCAAGCCGAAGTGATTGCCCCTGCCGTCCAGACAAACTGGCAGAAACTGGAAGTGTTTTG6 

GCTAAGCATATGTGGAACTTTTGCAGAGCCTCCGGCGTCCTGACAACCTCCTGCGGAAACACACTGACATGCTATATCAA 

AGCCAGAGCCGCTTGCAGAGCCGCTGGCCTCTTCGATAGGCTCCAGGTCCTGGATAGCCATTACCAAGACGTCCTGAAAG 

AGGTCAAGGCTGCCGCTAGCAAAGTGAAAGCCAATCTGCTCGGCCCTCTGACAAACTCCAGGGGAGAGAATTGCGGATAC 

AGAAGGTGTAGGGCTAGCGGAGTGCTCACCACAAGCTGTGGCAATACCCTCATCATGCACACAAGGTGTCACTGTGGCGC 

TGAGATTACCGGACACGTCAAGAATGGCACAATGAGAATCGTCGGCCCTAGGACATGCAGAGAGGTCAGCTTTAGGGTCG 

GCCTCCACGAATACCCTGTGGGAAGCCAACTGCCTTGCGAACCCGAACCCGATGTGGCTGTGCTCACCTCCAAGGAAGTG 

AAAGCCGCTGCCTCCAAGGTCAAGGCTAACCTCCTGTCCGTGGAAGAGGCTTGCTCCCTGACACCCCCTCACTCCGCCAA 

AGGCAGAGACGCTGTGATTCTGCTCATGTGTGTGGTCCACCCTACCCTCGTGTTTGACATTACCAAACTGCTCCTGGCTG 

TGTTTGGCCCTATGCTCACCGATCCCTCCCACATTACCGCTGAGGCTGCCGGAAGGAGACTGGCTAGGGGAAGCCCTCCC 

TCCATGGCTAGCTCCAGCGCTAGCCCTAGGCCTATCTCCTACCTCAAGGGAAGCTCCGGCGGACCCCTCCTGTGTCCCGC 

TGGCCATGCCGTCGGCATTTTCAGAGCCGCTGACTTTGACCAAGGCTGGGGCCCTATCTCCTACGCTAACG6AAGCGGAC 

CCGATCAGAGACCCTATTGCTGGCACTATCCCCCTAAGCCTAGGCATGTGGGACCCGGAGAGGGAGCCGTCCAGTGGATG 

AATAGGCrCATCGCTTTCGCTAGCAGAGGCAATCACGTCAGCCCTACCCATctcgagtgagaattcgcc 
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Cassette C 

ggcggatccaccatgctcgagTGCCTCTGGATGATGCTCCTGATTAGCCA?VGCCGAAGCCGCTCTGGA?^AACCTCGTGAT 

TCTGAATGCCGCTAGCCTCGC CGGAACCCATAT CATT C C CG ATAGGGAAGTG CTCTAC AGAGAGTTTGACG AAATGGAAG 

AGTGTAGGCAACACCTCCCCTATATCGAACAGGGAATGATGCTGATTCACCTCCACCAAAACATTGTGGATGTGCAATAC 

CTCTACGGAGTGGGAAGCTCCATCGCTAGCTGGGCCATTAAGTGGGAGTATGTGTCCCACGCTAGGCCTAGGTGGTTCTG 

GTTCTGTCTGCTCCTGCTCGCCGCTGGCGTCGGCATTTACCTCCTGCCTAACAGAGCCGCTGCCGCTACCCTCGGCTTTG 

GCGCTTACATGAGCAAAGCCCATGGCATTGACCCTAACATTAGGACAGGC6TCAGGACAATCACAACCGGAAGGGTCCAG 

GGACTGCTCAGGATTTGCGCTCTGGCTAGGAAAATGA^TTGGCGGACACTATGTGCAAATGGCTATCATTAAGCTCGGCGC 

TAGGAGATTCGCTCAGGCTCTGCCTGTGTGGGCCAGACCCGATTACAATCCCCCTCTGGTCGAGACATGGAAAAAGCCTG 

ACTATGAGCCTACCGCTGCCCAAACCTTTCTGGCTACCTGTATCAATGGCGTCTGCTGGACCGTCTACCATGGCGCTGGC 

ACAAGGACAATCGCTAGCCCTTGGGCTCACAATGGCCTCAGGGATCTGGCTGTGGCTGTGGAACCCGTCGTGTTTAGCCA 

AATGGAAACCAAACTGATTACCTGGGGCGCTAAGGGACCCGTCATCCAAATGTATACCAATGTGGATCAGGATCTGGTCG 

GCTGGCCCGCTCCCCAAGGCTCCAGGTCCCTGACACCCTGTAAGGTCGTGATTCTG6ATAGCTTTGACCCTCTGGTCGCC 

GAAGAGGATGAGAGAGAGATTAGCGTCCCCGCTGAGATTCT6AGAAAGTCCCTGACAGGCACATACGTCTACAATCACCT 

CACCCCTCTGAGAGACTGGGCCCATAACGGACTGAGAGACCTCGCCGTCGCCGTCGAGCCTGTGTGTACCAGAGGCGTCG 

CC?iAAGCCGTgGAtTTTATCCCTGTGGAAAACCTCGAGACaACCATGAGGTCCCCCGTCTTC7^CAGACAATGCCCTCGG^ 

ATTAACGCTGTGGCTTACTATAGGGGACTGGATGTGTCCGTGATTCCCACAAGCGGAGACGTCGTGGTCGTGGCTACCGA 

TATGTCCGCCGATCTGGAAGTGGTCACCTCCACCTGGGTGCTCGTGGGAG6C6TCCTGGCTGCCCTC6CCGCTTACTGTC 

TGTCCACCGGAGCCCTCATGACAGGCTATACCGGAGACTTTGACTCCGTGATTGACTGTAACACATGCGTCACCCAAACC 

GTgGAtTTTAGCCTGGACCCTAACACAAACAGAAGGCCTCAGGATGTGAAATTCCCTGGCGGAGGCCAAATCGTCGGCGG 

AGTGTATCTGCTCCCCAGAAGGGGACCCAGAAGGGCTCTGGCTCACGGAGTGAGAGTGCTCGAGGATGGCGTCAACTATG 

CCACAGGCAATCTGCCTGGCTGTAGCTTTAGCATTTTCCTCAGCAAATTCGGATACGGAGCCAAAGACGTCAGGTGTCAC 

GCTAGGAAAGCCGTCGCCCATATCAATAGCGTCTGGAAAGACCTCCTGGAAACCCCTGGCGCTAAGCAAAACATTCAGCT 

CATCAATACCAATGGCTCCTGGCATATCAATAGCACAGCCCTCAACTGTAACGAAAGCCTCAACACAGGCTGGCTGGCTG 

GCCTCTTCTATCAGCATAAGTTTAACTCCAGCGGATGCCCTGAGAGACTGGCTAGCTGTAGGAGACTGACAGTGGTCCTG 

CTCTTCCTCCTGCTCGCCGATGCCAGAGTGTGTAGCTGTCTGTGGATGATGCTGCTCATCTCCCAGGCTGAGGCTGCCCT 

CGACTGTGAGATTTACGGAGCCTGTTACTCCATCGAACCCCTCGACCTCCCCCCTATCATTCAGAGACTGCATGGCCTCA 

GCGCTTTCTCCTGGACAGTGTATCACGGAGCCGGAACCAGAACCATTGCCTCCCCCAAAGGCCCTGTGATTCAGATGTAC 

ACAAACGTgGAtCAAGACCTCTACAGATTCGTCGCCCCTGGCGAAAGGCCTAGCGGAATGTTTGACTCCAGCGTCCTGTG 

TGAGTGTTACGATGCCGGAT6CGCTTGGTATAGGTCCGA6CTCAGCCCTCTGCTCCTGTCCACCACACAGTGGCAGGTCC 

TGCCTTGCTCCTTCACAACCCTCCCCGCTCTGTCCACCGGACTGA6AAAGCTCGGCGTCCCCCCTCTGAGAGCCTGGAGG 

CATAGGGCTAGGTCCGTGAGAGCCA6ACTGCTCGCCAGAGGCGGAAGGGCTAGCCCTCTGACAACCTCCCAGACACTGCT 

CTTCAATATCCTCGGCGGATGGGTCGCCGCTCAGCTCGCCGCTCCCGGAGCCGCTACCGCTCTGTGGATtCTCCAGGCTA 

6CCTCCTGAAA6TGCCTTACTTTGT6AGAGTGCAAGGCCTCCTGAGAATCTGTGCCCTCGCCAGAAAGATGGTGAAAAAC 

G6AACCATGA6GATTGTGGGACCCAGAACCTGTAGGAATATGTGGA6CGGAACCTTTCCCATTAACGCTTACACAACCGG 

AGAGGTCGCCCTCAGCACAACCGGAGAGATTCCCTTTTACGGAAA6GCTATCCCTCTG6AAGTGATTAAGGGAGGCAGAC 

ACCTCATCTTTCTGACAAGaGAcCCCACAACCCCTCTGGCTAGGGCTGCCTGGGAGACAGCCAGACACACACCCGTCAAC 

TCCTGGCTCGGCAATATCATTAGGGTCAGCGCTGAGGAATACGTCGAGATTAGGAGAGTGGGAGACTTTCACTATGTGAC 

AGGCATGACCACAGACAATCTGAAATGCCCTCCCGTCGTGCATGGCTGTCCCCTCCCCCCTCCCAGAAGCCCTCCCGTCC 

CCCCTCCCAGAAAGAAAAGGACAGTGGTCCTGACAGAGTCCACCCTCAGCACAGCCCTCGCCGAACTGGCTACCAAAAGC 

TTTGGCTCCAGCTCCACCTCCGGCATTACCGGAGACAATACCACAACCTCCGTGTCCTGCCAAAGGGGATACAAAGGCGT 

CTGGAGAGGCGATGGCATTATGCATACCAGATGCCATTGCGGAGCCGAAATCACAGGCCATGTGTTTCTGGTCGGCCAAC 

TGTTTACCTTTAGCCCTAGGAGACACTGGACCACACAGGGATGCAATTGCTCCATCTATCCCGGACACATTTTCGTCGGC 

GCTGGCCTCGCCGGAGCCGCTATCGGAAGCGTCGGCCTCGGCAAAGTGCTCGTGGATATCCTCGCCGGATACGGAGCCGG 

AGACATTTGGGATTGGATTTGCGAAGTGCTCAGCGATTTCAAAACCTGGCTGAAAGCCAAACTGATGCCCCAACTGCCTG 

GCATTCCCTTTAACTCCAGCATTGTGTATGAGGCTGCCGATGCCATTCTGCATACCCCTGGCTGTGTGCCTTGCGTCAGG 

GAAGGCAATGCCTCCAGGTGTAGCTCCGGCTGTCCCGAAAGGCTCGCCTCCTGCAGAAGGCTCACCGATTTCGATCAGGG 

ATGGGGACCCATTAGCTATGCCAATGGCTCCAGGACAGAG6AAGCCATTTACCAATGCTGTGACCTCGACCCTCAGGCTA 

GGGTCGCCATTAAGTCCCTGACAGAGAGACTGTATGTGGGAGTGTCCATiGGGATGGAGACTGCTCGCCCCTATCACAGCC 

TATGCCCAACAGACAAGGGGACTGCTCGGCTGTATCATTACCTCCCTGACATTCTTTAGCGTCCTGATTGCCAGAGACCA 

ACTGGAACAGGCTCTGGATTGCGAAATCTATGGCGCTTGCTATAGCATTGAGCCTCTGGATTGCCAAGTGCCTAGCCCTG 

AGTTTTTCACAGAGCTC6ACGGA6T6AGACTGCATAGGTTTGCCCCTCCCTGTAAGCCTCTGCTCAGGGAAACCTGTTAC 

ATTAAGGCTA6GGCTGCCTGTAGGGCTGCCGGACTGCAAGACTGTACCATGCTGGTCTGCGGAGACGATCTGGTCGTGAT 

TATCGATCCCAATATCAGAACCGGAGTGAGAACCATTACCACAGGCTCCCCCATTACCTATAGCACATACGGAAAGTTTC 

TGGCTGACGGATGCAATTGGACAAGGGGAGAGAGATGCGATCTGGAAGACAGAGACAGAAGCGAACTGTCCCCCCTCCTG 

CTCAGCACAACCCAATGGCAAACCGGACACAGAATGGCTTGGGATATGATGATGAATTGGTCCCCCACAGCCGCTCTG6T 

CATGGCTCAGCTCCTGAGAATCCCTCAGGCTCCCGGAGCCCTCGTGGTCGGCGTCGTGTGTGCCGCTATCCTCAGGAGAC 

ACGTCGGCCCTGGCGAAGGCGCTGTGCAATGGATGAACAGACACGATAGCCCTGACGCTGAGCTCATCGAAGCCAATCTG 

CTCTGGAGACAGGAAATGGGAGGCAATATCACAAGGGTCGAGTCCGAGAATGAGGGAGTGTTTACCGGACTGACACACAT 

TGACGCTCACTTTCTGTCCCAGACAAAGCAAAGCGGAGAGAATTTCCCTTACCTCGTGGCTAGGGGAAGGAGACAGCCTA 

TCCCTAAGGCTAGGAGACCCGAAGGCAGAACCTGGGCCCAACCCGGATACCCTTGGCCTCTGTATGGCAATCTGATTGTG 

TTTCCCGATCTGGGAGTGAGAGTGTGTGAGAAAATGGCTCTGTATGACGTCGTGTCCAAGCTCCCCCTCGCCGTCATGGG 

ATACGCTACCGGAAACCTCCCCGGATGCTCCTTCTCCATCTTTCTGCTCGCCCTCCTGTCCTGCCTCACCGTCCCCGCTA 

GCGCTTACCAACTGGGAAAGGTCCTGGTgGAtATTCTGGCTGGCTATGGCGCTGGCGTCGCCGGAGCCCTCGTGGCTTTC 

AAAATCATGAGCGGAGAGGTCAGCTATAGCTCCATGCCTCCCCTCGAGGGAGAGCCTGGCGATCCCGATCTGTCCGACGG 

AAGCTGGAGCACAGTGTCCAGCGAAGCCGGAAGGCAAGAGATGGGCGGAAACATTACCAGAGTGGAAAGCGAAAACAAAG 

TGGTCATCCTCGACTCCTTCGATCCCCTCGTGGCTGAGGAAGTGGGATGGCCTGCCCCTCAGGGAAGCAGAAGCCTCACC 

CCTTGO^CATGCGGAAGCTCCGACCTCTACCTCGTGACAAGGCATGCCGATGGCTGTAGCGGAGGCGCTTACGATATC^ 

TATCTGTGACGAATGCCATAGCACAGACGCTACCTCCATCCTCGGCATTGGCACAGTGCTC^CCTTTACCATTGAGACM 

CCACa.CTGCCTCAGGATGCCGTCAGC:aGAACCC2iAAGGAGAG6a^GAACCGGAAGGG6AAAGCCTGGCATTGACTGTO 

AGAAAGCATCCCGAAGCCACATACTCCAGGTGTGGCTCCGGCCCTTGGATTACCCCTAGGTGTCTGGTgGAtTATCCCTA 

TCTGGCTGCCGGAGTGGGAATCTATCTGCTCCCCAATAGGGCTGCCGCCCTCGTGACACCCTGTGCCGCTGAGGAACAGA 
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AACTGCCTATCAATGCCCTCAGCAATAGCCTCCTGA6ACACCATAACCTCGTGTATATCTCCAGCGAATGCACAACCCCT 
TGCTCCGGCTCCTGGCTCAGGGATATCTGGGACTGGATCTGT6AGGTCGTGTCCGACTTTAAGACATGGCTCAAGGCTAA 
GCTCATGCCTCAGCTCCCCGGAATCCCTTTCGTCAGCTGTCAGAGAGGCTATAAGGGAGTGTGGAGGGGAGACGGAAGCG 
GACCCTGGATCACACCCAGATGCCTCGTGGATTACCCTTACAGACTGTGGCACTATCCCTGTACCATTAACTATACCATT 
TTCAAAagat ct TGAg t cgacgaat t cgc c 
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Melanoma Savine design 

Two savines - one containing scrambled melanocyte differentiation Ags 
- one containing scrambled melanoma cancer specific Ags 

Genes in melanocyte differentiation Savine 

gplOO 

MDLVLKRCLLHLAVIGALLAVGATKVPRNQDWLGVSRQLRTKAlAnsrRQLyPEWTEAQRL^ 
GANASFSIALNFPGSQKVIjPDGQVIWVJmTIINGSQVWGGQPWPQETD 

GQYWQVLGGPVSGIiSIGTGRAMLGTHTMEVTVYHRRGSRSWPLAHSSSAFTITDQVPFSVSVSQLRABDGGNKH 

NQPLTFALQLHDPSGYLAEADLSYTWDFGDSSGTLISRALWTHTYLEPGPVTAQVVIjQAAIPLTSCGSS^^ 

HRPTAEAPNTTAGQVPTTEWGTTPGQAPTAEPSGTTSVQVPTTEVISTAPVQMPTAESTGMTPEKVPVSEVMGTTLA 

EMSTPEATGMTPAEVSIWLSGTTAAQVTTTEWVETTARELPIPEPEGPDASSIMSTESITGSLGPIiLDGTATIiRLVK 

RQVPLDCVLYRYGSFSVTLDIVQGIESAEILQAVPSGEGDAFELTVSCQGGLPKEAOyiEISSPGCQPP^^ 

SPACQLVLHQILKGGSGTYCLNVSLiADTNSIjAWSTQIjIMPGQEAGLG^^ 

FSVPQLPHSSSHWLRLPRIFCSCPIGEWSPLLSGQQV 

MART 

MPREDAHFIYGYPKKGHGHSYTTAEEAAGIGILTVILGVLLLIGCWYCRRRNG1^^AL^^ 
FDHRDSKVSLQEKNCEPWPNAPPAYEKIiSAEQSPPPYSP 

TRP-1 

PAFLTWHRYHLLRLEKDMQEMLQEPSFSLPYWNFATGKWCDICTDDLMGSRSMFDSTLISPNSVFSQWRWC^ 
YDTLGTLlCNSTEDGPIRRNPAGWARP^TV^QRLPEPQDVAQCLEVGLFDTPPFYSNSTMSFRNTVEGYSDPTGKYDPAV 
RSLHNLAHLFLNGTGGQTHLSSQDPIFVLLHTFTDAVFDEWLRRYNADISTFPLENAPIGHNRQYm 
MFVTAPDNLGYTYE 

Tyros 

MLLAVLYCLIiWSFQTSAGHFPRACVSSKNLMEKECCPPWSGDRSPCGQLSGRGSCQMILLSNAPLGPQFPFTGVDDRE 

SWPSVFYHRTCQCSGNFMGFNCGNCKF'GFWGPNCTERRLLVRRNIFDLSAPEKDKFFAYLTIjAK^ 

GQMKNGSTPMFimiNIYDIjFVWMHYYVSMDALLGGSEIWRDIDFAHEAP 

YWDWRDAEKCDICTDEYMGGQHPTNPNLLSPASFFSSWQIVCSRLEEYlSrSHQSLCNGTPEGPLRRNPGNH 
PSSADVEFCLSLTQYESGSMDKAANFSFRNTLEGFASPLTGIADASQSSMHNALHIYIWGTMSQVQGSAI^ 
AFVDSIFEQWLQRHRPLQEVYPEANAPIGHl^RESYIWPFIPLYRNGDFFISSKDLGYDYSYIiQDSDP^ 
EQASRIWSWLLGAAMVGAVLTALLAGLVSLI^CRHKRKQLPEEKQPIiLMEKEDYHSLYQSHL 

TRP2 

MSPLWWGFLIiSCLGCKILPGAQGQFPRVCMTVDSLVlSrKECCPRLGAESAWCGSQQGRG^ 

NQDDRELWPRKFFHRTCKCTGNFAGYNCGDCKFGWTGPNCERKKPPVIRQNIHSLSPQEREQFLGALDLAKKRVHPDY 

VITTQHWLGLLGPNGTQPQFANCSWDFFWLHYYSVRDTLLGPGRPYRAIDFSHQGPAFVTWHRYHLLCLERDLQRL 

IGNESFALPYWNFATGRNECDVCTDQLFGAARPDDPTLISRNSRFSSWETVCDSLDDYiraLWLaSfGTYEGL^ 

GRNSMKliPTLKDIRDCLSLQKFDNPPFFQNSTFSFRNALEGFDKADGTLDSQVMSLH]Sr^ 

IFVVia:SFTDAIFDEWMKRFNPPADAWPQELAPIGHNRMYN^ 

WPTTLLVVMGTLVALVGLFVLLAFLQYRRLRKGYTPLMETHLSSKRYTEBA 

MCIR 

MAVQGSQRRLLGSLNSTPTAIPQLGLAANQTGARCLEVSISDGLFLSLGLVSLVENAIiWATIAKW^ 
CLALSDLLVSGTNVLETAVILLLEAGALVARAAVLQQLDWIDVITCSSMIiSSLCFLGAIAV^ IV 

TLPRAPRAVAAIWASWFSTLFIAYYDHVAVLLCLWFFLAMIiVLMAVLYV^ 

FGLKGAVTLTlLLGIFFLCWGPFFLHbTLIVLCPEHPTCGCIFKHFNLFLAIiIICNAIIDPLIYAFHSQELRRTLKEV 
LTCSW 

MUCIF 

MTPGTQSPFFLLLLLTVLTWTGSGHASSTPGGEKETSATQRSSVPSSTEKITAVSMTSSVIiSSHSPGSGSSTTQG 
TLAPATEPASGSAATWGQDVTSVPVTRPALGSTTPPAHDVTSAPDNK 
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MUCIR 

NRPALGSTAPPVHIWTSASGSASGSASTLVHNGTSARATTTPASKSTPFSIPSHHSDTPTTLASHSTK^ 
VPPLTSSNHSTSPQLSTGVSFFFLSFHISNLQFNSSLEDPSTDYYQELQRDISEMFLQIYKQGGFLGLSNIKFRPGSV 
WQLTLAFREGTI]S^VI^DVETQFNQYKTEAASRYNLTISDVSVSDVPFPFSAQSGAGVPGWGIALLVIJVCVLVALAIVY 
LIALAVCQCRRKlSrYGQLDIFPARDTYHPMSEYPTYHTHGRYVPPSSTDRSPYEKVSAGNGGSSLSYTNPAV^ 



NB Muc 1 Repeat sequences in tlie middle of the gene were removed 



Genes in melanonia specific Savine 

BAGE 

MAARAVFIiAIiSAQLLQARLMKEESPWSWRLEPEDGTAIj.CFIF 
GAGE-1 

MSWRGRSTYRPRPRRYVEPPEMIGPMRPEQFSDEVEPATPEEGEPATQRQDPAAAQEGEDEGASAGQGPKPEADSQEQ 
GHPQTGCECEDGPDGQEMDPPNPEEVKTPEEEMRSHYVAQTGILWLLMimCPLN^ 

gplOOIn4 

SWSQKRSFVYVWKTWGEGLPSQPIIHTCVYFFLPDHLSFGRPFHIiNFCDFIi 
MAGE-1 

MSLEQRSLHCKPEEALEAQQEADGLVCVQAATSSSSPLVIiGTLEEVPTAGSTDPPQSPQGASAFPTTINFTRQRQPSE 
GSSSREEEGPSTSCILESLFRAVITKKVADLVGFLLLKYRAREPVTKAEMLESVIKNYKHCFPEIFG^ 
IDVKEADPTGHSYVIiVTCLGLSYDGLLGDNQIMPKTGFLIIVLVMIAMEGGHAPEEEIWEEIiSVMEV^ 
PRKLLTQDLVQBKYLEYRQVPDSDPARYEFLWGPRALAETSYVKVLEYVIKVSARVRFFFPSLRE^ 

MAGE- 3 

MPLEQRSQHCKPEEGLEARGEAIiGLVGAQAPATEEQEAASSSSTLVEVTLGEVPAAESPDPPQSPQGASSLPTTMNYP 

LWSQSYEDSSNQEEEGPSTFPDLESEFQAALSRKVAELVHFLLIiKYRAREPVTKAEMLGSWGlTO 

SLQLVFGIELMEVDPIGHLYIFATCLGLSYDGLLGDNQIMPKAGLLIIVIiAIIAREGDCAPEEKIWEELSVIiE^ 

EDSILGDPKKLLTQHFVQENYIiEYRQVPGSDPACYEFLWGPRALVETSYVK^LHHiy^ 

EE 

FRAME 

MERRRLWGSIQSRYISMSWTSPRRLVELAGQSIIBKDEAIlAIAALELLPRSIIFPPLFMAAFDGRHSQTLKA^^ 

TCLPLGVLMKGQHLHLETFKAVLDGLDVLIiAQEWPRRWKIiQVLDLRKNSHQDFWTWSGl^ 

KKRKVDGLSTEAEQPFIPVEVLVDLFLKEGACDELFSYLIEKVKRKKI^ 

IEDLEVTCTWKI.PTLAKFSPYLGQMINLRRLLIiSHIHASSYISPEKEEQYIAQFTSQFLSLQCLQALYVDSLFFL^ 
LDQLLRHVMNPLETIiSITNCRLSEGDVMHLSQSPSVSQLSVLSLSGVMLTDVSPEPLQAIiLERASAT^ 
TDDQLLALLPSLSHCSQLTTLSFYGNSISISALQSLLQHLIGIiSlSrLTHVLYPVPLESYEDIHGTLHIiE^^ 
ELLCELGRPSMVWLSANPCPHCGDRTFYDPEPIIiCPCFMPN 

TRP2IN2 

LMETHLSSKRYTEEAGGFFPWLKVYYYRFVIGLRWQWEVISCKLIKE^TTRQP 
NYNSOla 

MQAEGRGTGGSTGDABGPGGPGIPDGPGGNAGGPGEAGATGGRGPRGAGAARASGPGGGAPRGPHGGAASGLNGCCRC 

GARGPESRLLEFYLAMPFATPMEAEIiARRSIAQDAPPIiPVPGVLLKEFTVSGNILTXRLTAADHRQL^ 

SIiLMWITQCFLPVFLAQPPSGQRR 

NYNSOlb 

MDMAQEALAFIiMAQGAMLAAQERRVPRAAEVPGAQGQQGPRGREEAPRGVRMI^ARLQG 
IiAGEl 
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MQAEGQGTGGSTGDADGPGGPGIPDGPGGNAGGPGEAGATGGRGPRGAGAARASGPRGGAPRGPHGGAASAQDGRCPC 
GARRPDSRLLQLHITMPFSSPMEAELVRRILSRDAAPLPRPGAVLKDFTVSGNLLFIRLTAADHRQLQLSISSCLQQL 
SLLMWITQCFIiPVFLAQAPSGQRR 



Differentiation Savine Scramble process 



Disease name 
Input filename 
Output f il enarae 
Number genes 
Number segments 
Segment length 
Segment overlap 



melanoma 
Dif f mucg . txt 
Dif f mucs . txt 
8 

187 

30 

15 



Segments in original order: 



Gene : gplOO 

Segment# : 1 
Offset : 1 

1st Codon : 1 

AAMDIjVIiKRCIiLHLAVIGAIiIiAVGATKVPR 
GCCGCTATGGATCTGGTCCTGAAAAGGTGTCTGCTCCACCTCGCCGTCATCGGAGCCCrCCTGGCTGTGGGAGCC^ 

Gene : gplOO 

Segment # : 2 
Offset : 16 
1st Codon : 1 

VIGALLAVG-ATKVPRNQDWLGVSRQIiRTKA 
GTGATTGGCGCTCTGCTCGCCGTCGGCGCTACCAAAGTGCCrAGGAATCffi.GGATTGGCTCG6CGTCAGCAGAm 

Gene : gplOO 

Segment # : 3 
Offset t 31 
1st Codon : 1 

NQDWLGVSRQLRTKAWNRQBYPEWTEAQRIt 
AACCAAGACTGGCTGGGAGTGTCCAGGCAACTGAGAACCAAAGCCTGGAACAGACAGCTCTACCCTGAGTGGACCGAA6CCCAAAGGCTC 

Gene : gplOO 

Segment # : 4 
Offset : 46 

1st Codon : 1 

WNRQLYPEWTEAQRLDCWRGGQVSDKVSND 
TGGAATAGGCAACTGTATCCCGAAT6GACAGAGGCTCAGAGACTGGATTGCTGGAGGGGAGGCCAAGTGTCCCTGAAAGTGTCCAACGAT 

Gene : gplOO 

Segment # t 5 
Offset : 61 
1st Codon : 1 

DCWRGGQVSLKVSNDGPT. LIGANASFSIAli 
GACTGTTGGAGAGGCGGACAGGTCAGCCTCAAGGTCAGCAATGACGGACCCACACTGATTGGCGCTAACGCTAGCTTTAGCATTGCCCTC 

Gene : gplOO 

Segment# : 6 
Offset : 76 

1st Codon : 1 

GPTIiIGANASFSIALNFPGSQKVLPDGQVl 
GGCCCTACCCTCATCGGAGCCAATGCCTCCTTCTCCATCGCTCTGAATTTCCCTGGCTCCCAGAAAGTGCTCCCCGATGGCCAAGTGATT 

Gene : gplOO 

Segment# : 7 
Offset : 91 

1st Codon : 1 

NFPGSQKVLPDGQVIWVNNTIINGSQVWGG 
AACTTTCCCGGAAGCCAAAAGGTCCTGCCTGACGGACAGGTCATCTGGGTGAATAACACAATCATTAACGGAAGCCAAGT6TGGGGCGGA 



Gene : gplOO 

Segment# : 8 

Offset : 106 

1st Codon : 1 
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WVNNTIINGSQVWGGQPVYPQETDDACIPP 
TGGGTCAACAATACCATTATOiATGGCTCCaVGGTCTGGGGAGGCCAACCCGTCTACCCTCAGGAAACCGATGACGCT 

Gene : gplOO 

SegnierLt# : 9 
Offset : 121 

1st Codon ; 1 

QPVYPQETDDACIFPDGGPCPSGSWSQKRS 
CAGCCTGTGTATCCCCAAGA6ACAGACGATGCCTGTATCTTTCCCGATGGCGGACCCTGTCCCTCCGGCTCCTGGTCCCAGAAAAGGTCC 

Gene : gplOO 

Segments : 10 
Offset : 136 
1st Codon ; 1 

DGGPCPSGSWSQKRSFVYVWKTWGQYWQVI, 
GACGGAGGCCCTTGCCCTAGCGGAAGCTGGAGCCAAAAGAGAAGCTTTGTGTATGTGTGGAAGACATGGGGACAGTATTGGCAAGTGCTC 

Gene : gplOO 

Segment # : 11 
Offset : 151 
1st Codon : 1 

FVYVWKTWGQYWQVLGGPVSGLSIGTGRAM 
TTCGTCTACGTCTGGAAAACCTGGGGCCAATACT6GCAGGTCCTGGGAGGCCCTGTGTCCGGCCTCAGCATTGGCACA6GCAGAGCCATG 

Gene : gplOO 

Segment# : 12 
Offset : 166 

1st Codon : 1 

GG PVSGLSIGTGRAMLGTHTMEVTVYHRRG 
GGCGGACCCGTCAGCGGACTGTCCATCGGAACCGGAAGG6CTATGCTCGGCACACACACAATGGAAGTGACAGTGTATCACAGAAGGGGA 

Gene ; gplOO 

Segment # : 13 
Offset : 181 

1st Codon : 1 

liGTHTMEVTVYHRRGSRSYVPIiAHSSSAFT 
CrGGGAACCCATACCATGGAGGTCACCGTCTACCATAGQAGAGGCTCCAGGTCCTACGTCCCCCTCGCCCATAGCTCCAGCGCTTTCACA 

Gene : gplOO 

Segments ; 14 
Offset : 196 
1st Codon : 1 

SRSYVPLAHSSSAFTITDQVPFSVSVSQIiR 
AGCAGAAGCTATGTGCCTCTGGCTCACTCCAGCTCCGCCTTTACCATTACCGATCAGGTCCCCTTTAGCGTCAGCGTCAGCCAACTGAGA 

Gene : gplOO 

Segments : 15 
Offset : 211 

1st Codon : 1 

ITDQVPFSVSVSQIiRAIaDGGNKHFLRNQPL 
ATCACAGACCAAGTGCCTTTCTCCGT6TCCGTGTCCCAGCTCA6GGCTCTGGATGGCGGAAACAAACACTTTCTGAGAAACCAACCCCTC 

Gene : gpioo 

Segments : 16 
Offset : 226 

1st Codon : 1 

AliDGGNKHFIiRNQPLTFALQLHDPSGYIiAE 
GCCCTCGACGGAGGCAATAAGCATTTCCTCAGGAATCAGCCTCTGACATTCGCTCTGCAACTGCATGACCCTAGCGGATACCTCGCCGAA 

Gene : gplOO 

Segments : 17 
Offset : 241 

1st Codon : 1 

TFALQLHDPSGYLAEADLSYTWDFGDSSGT 
ACCTTTGCCCTCCAGCTCCACGATCCCTCCGGCTATCTGGCTGAGGCTGACCTCAGCTATACCTGGGACTTTGGCGATAGCTCCGGCACA 

Gen'e : gplOO 

Segments : 18 
Offset : 256 

1st Codon : 1 

ADLSYTWDFGDSSGTLISRALVVTHTYLEP 
GCCGATCTGTCCTACACATGGGATTTCGGAGACTCCAGCGGAACCCTCATCTCCAGGGCTCTGGTCGTGACACACACATACCTCGAGCCT 
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Gene : gplOO 

Segment# : 19 
Offset : 271 
1st Codon : 1 

lilSRALVVTHTYIiEPGPVTAQVVIjQAAI PL 
CTGATTAGCAGAGCCCTCGTGGTCACCCATACCTATCTGGAACCCGGACCCGTCACCGCTCAGGTCGTGCTCCAGGCTGCCATTCCCCTC 

Gene : gpl 0 0 

Segment # : 2 0 
Offset : 286 

1st Codon ; 1 

GPVTAQVVLQAAI PLTSCGSSPVPGTTDGH 
GGCCCTGTGACaGCCCAAGTGGTCCTGCUWWSCCGCTATCCCTCTGACAAGCTGTGGCTCCAGCCCTGTGCCTGGCACAAC 

Gene : gpl 00 

Segment # : 21 
Offset : 301 
1st Codon : 1 

TSCGSSPVPGTTDGHRPTAEAPNTTAGQVP 
ACCTCCTGCGGAAGCTCCCCCGTCCCCGGAACCACAGACGGACACA6ACCCACAGCCGAAGCCCCTAACACAACCGCTGGCCAAGTGCCT 

Gene : gpl 0 0 

segment # : 22 
Offset : 316" 
1st Codon : 1 

RPTAEAPNTTAGQVPTTEVVGTTPGQAPTA 
AGGCCTACCGCTGAGGCTCCCAATACCACAGCCGGACAGGTCCCCACAACCGAAGTGGTCGGCACAACCCCTGGCCAAGCCCCTACCGCT 

Gene : gpl 00 

Segment # : 23 
Offset : 331 
1st. Codon : 1 

TTEVVGTTPGQAPTAEPSGTTSVQVPTTEV 
ACCACAGAGGTC6TGGGAACCACACCCGGACAGGCTCCCACAGCCGAACCCTCCGGCACAACCTCCGTGCAAGTGCCTACCACAGAG6TC 

Gene : gpl 00 

Segment # : 24 
Offset : 346 

1st Codon : 1 

EPSGTTSVQVPTTEVISTAPVQMPTAESTG 
GAGCCTAGCGGAACCACAAGCGTCCA6GTCCCCACAACCGAAGTGATTAGCACAGCCCCTGTGCAAATGCCTACCGCTGAGTCCACCGGA 

Gene : gpl 00 

Segment # : 25 
Offset : 361 
1st Codon I 1 

I STAPVQMPTAESTGMTPEKVPVS EVMGTT 
ATCTCCACCGCTCCCGTCCAGATGCCCACAGCCGAAAGCACAGGCATGACCCCTGAGAAAGTGCCTGTGTCCGAGGTCATGGGAACCACA 

Gene : gpl 00 

Segments : 26 
Offset : 376 

1st Codon : 1 

MTPEKVPVSEVMGTTIiAEMSTPEATGMTPA 
ATGACACCCGAAAAGGTCCCCGTCAGCQAAGTGATGGGCACAACCCTCGCCGAAATGTCCACCCCTGAGGCTACCGGAATGACACCCGCT 

Gene : gpl 00 

Segment# : 27 
Offset : 391 
1st Codon : 1 

LAEMSTPEATGMTPAEVSIVVLSGTTAAQV 
CTGGCTGA6ATGAGCACACCCGAAGCCACAGGCAT6ACCCCTGCCGAA6TGTCCATCGTCGTGCTCAGC6GAACCACAGCCGCTCAGGTC 

Gene : gpl 0 0 

Segment # : 28 
Offset : 406 
1st Codon : 1 

EVSIVVLSGTTAAQVTTTBWVETTAREliPI 
GAGGTCAGCATTGTGGTCCT6TCCGGCACAACCGCTGCCCAAGTGACAACCACAGAGTGGGTGGAAACCACAGCCAGAGAGCTCCCCATT 

Gene : gpl 00 
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Segment# : 2 9 
Offset : 421 

1st Codon : 1 

TTTEWVETTAREIiPIPEPEGPDAS SIMSTE 
ACCACAACCGAATGGGTCGAGACAACCGCTAGGGAACTGCCTATCCCTGAGCCTGAGGGACCCGATGCCTCCAGCATTATGTCCACCGAA 

Gene : gplOO 

Segmaent# : 3 0 
Offset r 436 

1st Codon ; 1 

PEPEGPDASSIMSTESITGSIiGPLLDGTAT 
CCCGAACCCGAAGGCCCTGACGCTAGCTCCATCATGAGCACAGAGTCCATCACAGGCTCCCTGGGACCCCTCCTGGATGGCACAGCCACA 

Gene : gplOO 

Segment # : 31 
Offset : 451 
1st Codon : 1 

SITGSI:jGPIiIjDGTATIiRI.VKRQVPLDCVLY 
AGCATTACCGGAAGCCTCGGCCCTCTGCTCGACGGAACCGCTACCCTCAGGCTCGTGAAAAGGCAAGTGCCTCTGGATTGCGTCCTGTAT 

Gene : gplOO 

Seginent# : 32 
Offset : 466 

1st Codon : 1 

IiRIjVKRQVPIiDCVLYRYGSFSVTLDIVQGI 
CTGAGACTGGTCAAGAGACAGGTCCCCCTCGACTGTGTGCTCTACAGATACGGAAGCTTTAGCGTCACCCTCGACATTGTGCAAGGCATT 

Gene : gplOO 

Segment# : 33 
Offset : 481 
1st Codon : 1 

RYGSPSVTLDIVQGIESAEILQAVPSGEGD 
AGGTATGGCTCCTTCTCCGTGACACTGGATATCGTCCAGGGAATCGAAAGCGCTGAGATTCTGCAA6CCGTCCCCTCCGGCGAAGGCGAT 

Gene : gplOO 

Segment # : 34 
Offset : 496 

1st Codon : 1 

ESAEILQAVPSGEGDAFELTVSCQGGIiPKE 
GAGTCCGCCGAAATCCTCCAG6CTGTGCCTAGCGGAGAGGGAGACGCTTTCGAACTGACAGT6TCCTGCCAAGGCGGACTGCCTAAGGAA 

Gene : gplOO 

Segment # : 35 
Offset : 511 
1st Codon ; 1 

AFELTVSCQGGIiPKEACMEISSPGCQPPAQ 
GCCTTTGAGCTCACCGTCAGCTGTCAGGGAGGCCTCCCCAAAGAGGCTTGCATGGAGATTAGCTCCCCCGGATGCCAACCCCCTGCCCAA 

Gene : gplOO 

Segment# : 36 
Offset : 526 

1st Codon : 1 

ACMEISSPGCQPPAQRLCQPVLPSPACQLV 
GCCTGTATGGAAATCTCCAGCCCTGGCTGTCAGCCTCCCGGTCAGAGACTGTGTCAGCCTGTGCTCCCCTCCCCCGCTTGCCAACTGGTC 

Gene : gpl 0 0 

Segments : 37 
Offset : 541 

1st Codon : 1 

RLCQPVLPSPACQLVLHQILKGGSGTYCLN 
AGGCTCTGCCAACCCGTCCTGCCTAGCCCTGCCTGTCAGCTCGTGCTCCACCAAATCCTCAAGGGAGGCTCCGGCACATACTGTCTGAAT 

Gene : gpl 00 

Segments : 3 8 
Offset : 556 

1st Codon ; 1 

LHQILKGGSGTYCLNVSLADTNSLAVVSTQ 
CTGCATCAGATTCTGAAAGGCGGAAGCGGAACCTATTGCCTCAACGTCAGCCTCGCCGATACCAATAGCCTCGCCGTCGTGTCCACCCAA 

Gene : gpl 00 

Segment # : 39 
Offset : 571 
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1st Codon : 1 

VSXiADTNSLAVVSTQIiIMPGQEAGLGQVPIi 
GTGTCCCTGGCTGACACAAACTCCCTGGCTGTGGTCAGCACACAGCTCATCATGCCCGGACAGGAAGCCGGACTGGGACAGGTCCCCCTC 



1st Codon : 1 

IiIMPGQEAGIiGQVPLIVGIIjLVIiMAVVI»AS 
CTGATTATGCCTGGCCAAGAGGCTGGCCTCGGCCAAGTGCCTCTGATTGTGGGAATCCTCCTGGTCCTGATGGCCGTCGTGCTCGCCTCC 



Segment # : 41 
Offset : 601 

1st Codon : 1 

IVGIIiLVLMAVVLASLIYRRRLMKQDFSVP 
ATCGTCGGCATTCTGCTCGTGCTCATGGCTGTGGTCCTGGCTAGCCTCATCTATAGGAGAAGGCTCATGAAACA6GATTTCTCCGTGCCT 

Gene : gplOO 

Segment # : 42 
Offset r 616 

1st Codon : 1 

LIYRRRLMKQDFSVPQLPHSSSHWLRLPRI 
CTGATTTACAGAAGGAGACTGATGAAGCAAGACTTTAGCGTCCCCCAACTGCCTCACTCCAGCTCCCACTGGCTGAGACTGCCTAGGATT 

Gene : gplOO 

Segments : 43 
Offset : 631 

1st Codon : 1 

QIiPHSSSHWIiRIiPRIFCSCPIGENSPLIiSG 
CAGCTCCCCCATAGCTCCAGCCATTGGCTCAGGCTCCCCAGAATCTTTTGCTCCTGCCCTATCGGA6AGAATAGCCCTCTGCTCAGCGGA 

Gene : gplOO 

Segment # : 44 , 
Offset : 646 
1st Codon : 1 

FCSCPIGENSPLLSGQQVAA 
TTCTGTAGCTGTCCCATTGGCGAAAACTCCCCCCTCCTGTCCGGCCAACAGGTCGCCGCT 

Gene ; MART 

Segment # ; 1 
Offset : 1 
1st Codon : 1 

AAMPREDAHFIYGYPKKGHGHSYTTAEEAA 
GCCGCTATGCCTAGGGAAGACGCTCACTTTATCTATGGCTATCCCAAAAAGGGACACGGACACTCCTACACAACCGCTGAGGAAGCCGCT 

Gene : M2\RT 

Segments : 2 
Offset : 16 

1st Codon : 1 

KKGHGHSYTTAEEAAGIGILTVILGVLLLI 
AAGAAAGGCCATGGCCATAGCTATACCACAGCCGAAGAGGCTGCCGGAATCGGAATCCTCACCGTCATCCTCGGCGTCCTGCTCCTGATT 

Gene : MART 

Segments : 3 
Offset : 31 
1st Codon : 1 

GIGILTVIIiGVLLLIGCWYCRRRlXGYRAIiM 
GGCATTGGCATTCTGACAGTGATTCTGGGAGTGCTCCTGerCATCGGATGCTGGTACTGTAGGAGAAGGAATGGCTATAGGG^ 

Gene : MART 

Segment# : 4 
Offset : 46 

1st Codon : 1 

GCWYCRRRNGYRALMDKSLHVGTQ CALTRR 
GGCTGTTGGTATTGCAGAAGGAGAAACGGATACAGAGCCCTCATGGATAAGTCCCTGCATGTGGGAACCCAATGCGCTCTGACAAGGAGA 

Gene : MART 

Segments : 5 
Offset : 61 

1st Codon : 1 

DKSLHVGTQCALTRRCPQEGPDHRDSKVSIj 



Gene 
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Offset 
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GACAAAAGCCTCCACGTCGGCACACaGTGTGCCCTCACCT^GAAGGTGTCCCCAAGAGGGATTCGATCACAGAGACTCCA^ 

Gene : MART 

Segments : 6 
Offset : 76 

1st Codori : 1 

CPQEGFDHRDSKVSLQEKNCEPVVPNAPPA 
TGCCCTCaVGGAAGGCTTTGACCATAGGGATAGCSiAAGTGTCCCn^GCAAGAGAAAAACTGTGAGCCTGTGGTCCC 

Gene : MART 

Segments : 7 
Offset : 91 
1st Codon : 1 

QEKNCEPVVPNAPPAYEKLSAEQSPPPYS P 
CAGGAAAAGAATTGCGAACCCGTCGTGCCTAACGCTCCCCCTGCCTATGAGAAACTGTCCGCCGAACAGTCCCCCCCTCCCTATAGCCCT 

Gene : MART 

Segment^ : 8 
Offset : 106 
1st Codon : 1 

YEKIiSAEQSP PPYSPAA 
TACGAAAAGCTCAGCGCTGAGCAAAGCCCTCCCCCTTACTCCCCCGCTGCC 

Gene : TRP-1 

Segments : 1 
Offset : 1 

1st Codon : 1 

AAPAFLTWHRYHLLiRIjEKDMQEMIjQEPS,FS 
GCCGCTCCCGCTTTCCTCACCTGGCACAGATACCATCTGCTCAGGCTCGAGAAAGACATGCAGGA2\AT6CTCCAGGAACCCTCCTTCTCC 

Gene : TRP-1 

Segments : 2 
Offset : 16 
1st Codon : 1 

LEKDMQBMIiQBPSFSLPYWNPATGKNVCDI 
CTGGAAAAGGATATGCAAGAGAT6CTGCAAGAGCCTAGCTTTAGCCTCCCCTATTGGAATTTCGCTACCGGAAAGAATGTGTGTGACATT 

Gene ; TRP-1 

Segments : 3 
Offset : 31 
1st Codon : 1 

LPYWNFATGKNVCDICTDD LMGSRSNFDST 
CTGCCTTACTGGAACTTTGCCACAGGCAAAAACGTCTGCGATATCTGTACCGATGACCTCATGGGAAGCAGAAGCAATTTCGATAGCACA 

Gene : TRP-1 

Segments : 4 
Offset : 46 
1st Codon : 1 

CTDDLMGSRSNFDSTLISPNSVFSQWRVVC 
TGCACAGACGATCTGATGGGCTCCAGGTCCAACTTTGACTCCACCCTCATCTCCCCCAATAGCGTCTTCTCCCAGTGGAGGGTCGTGTGT 

Gene : TRP-1 

Segments : 5 
Offset : 61 
1st Codon : 1 

IiISPHSVPSQWRVVCDSLEDYDTIiGTIiCNS 
CTGATTAGCCCTAACTCCGTGTTTAGCCAATGGAGAGTGGTCTGCGATAGCCTCGAGGATTACGATACCCTCGGCACACTGTGTAACTCC 

Gene : TRP-1 

Segments : 6 
Offset : 76 
1st Codon : 1 

DSLEDYDTLGTL.CNSTEDGPIRRNPAGNVA 
GACTCCCTGGAAGACTATGACACACTGGGAACCCTCTGCAATAGCACAGAGGATGGCCCTATCAGAAGGAATCCCGCTGGCAATGTGGCT 

Gene : TRP~1 

Segments : 7 
Offset : 91 
1st Codon : 1 

TEDGPIRRNPAGNVARPMVQRLPEPQDVAQ 
ACCGAAGACGGACCCATTAGGAGAAACCCTGCCGGAAACGTCGCCAGACCCATGGTGCAAAGGCTCCCCGAACCCCAAGACGTCGCCCAA 
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Segments : 8 
Offset : 106 

1st Codon : 1 

RPMVQRIiPEPQDVAQCLEVGLFDTPPFYSN 
AGGCCTATGGTCCAGAGACTGCCTGAGCCTCaGGATGTGGCTCAGTGTCTGGAAGTGGGACTGTTTGACACyVCCCCCTTTCTA 

Gene : TRP-1 

Segment# ; 9 
Offset : 121 

1st Codon : 1 

CliEVGLFDTPPFYSNSTNSFRNTVEGYSDP 
TGCCTCGAGGTCGGCCTCTTCGATACCCCTCCCTTTTACTCCAACTCCACCAATAGCTTTAGGAATACCGTCGAGGGATACTCCGACCCT 

Gene : TRP-1 

Segment # : 10 
Offset : 136 
1st Codon : 1 

STNSFRNTVEGYSDPTGKYD PAVRSLHNLA 
AGCACAAACTCCTTC».GAAACACAGTGGAAGGCTATAGCGATCCCACyVGGCAAATACGATCCCGCTGTGAGAAGCCTCCAC^ 

Gene : TRP~1 

Segments : 11 
Offset ; 151 

1st Codon : 1 

TGKYDPAVRSLHNLAHIiFIiNGTGGQTHLSS 
ACCGGAAAGTATGACCCTGCCGTCAGGTCCCTGCTiTAACCTCGCCCATCTGTTTCTGAATGGCACAGGCGGACAGACACACCTCAGCTCC 

Gene : TRP-1 

Segments : 12 
Offset : 166 

1st Codon : 1 

HLFLNGTGGQTHLSSQDP IFVIiLHTFTDAV 
CACCTCTTCCTCAACGGAACCGGAGGCCAaACCCATCTGTCCAGCCAAGACCCTATCTTTGTGCTCCTGCATACCTTTACCGATGCCGTC 

Gene : TRP-1 

Segments : 13 
Offset : 181 
1st Codon : 1 

QDPIFVLIiHTFTDAVFDEWrjRRYNADISTP 
CAGGATCCCATTTTCGTCCTGCTCCACACATTCACAGACGCTGTGTTTGACGAATGGCTCAGGAGATACAAT6CCGATATCTCCACCTTT 

Gene : TRP-1 

Segments : 14 
Offset : 196 
1st Codon : 1 

FDEWLRRYNADISTFPIiENAP IGHNRQYNM 
TTCGATGAGTGGCTGAGAAGGTATAACGCTGACATTAGCACATTCCCTCTGGAAAACGCTCCCATTGGCCATAACAGACAGTATAACATG 

Gene : TRP-1 

Segments : 15 
Offset : 211 
1st Codon : 1 

PLENAPIGHNRQYNMVPFWPPVTNTEMFVT 
CCCCTCGAGAATGCCCCTATCGGACACAATAGGCAATAC?iATATGGTCCCCTTTTGGCCTCCCGTCACCAATACCG2!iAATGTTTGTGACA 

Gene : TRP-1 

Segments : 16 
Offset : 226 
1st Codon : 1 

VPFWP PVTNTEMFVTAPDNLGYTYEAA 
GTGCCTTTCTGGCCCCCTGTGACAAACACAGAGATGTTCGTCACCGCTCCCGATAACCTCGGCTATACCTATGAGGCTGCC 

Gene : Tyros 

Segments : 1 
Offset : 1 
1st Codon : 1 

AAMLIjAVLYCLLWSFQTSAGHFPRACVSSK 
GCCGCTATGCTCCT6GCTGTGCTCTACTGTCTGCTCTGGTCCTTCCAAACCTCCGCC6GACACTTTCCCAGAGCCTGTGTGTCCAGCAAA 

Gene : Tyros 

Segments : 2 
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158/216 

Offset : 16 
1st Codon ; 1 

QTSAGHF PRACVSSKNLMEKECCPPWSGDR 
CAGACAAGCGCTGGCCATTTCCCTAGGGCTTGCGTCAGCTCCAAGAATCTGATGGAGAAAGAGTGTTGCCGTCCCTQGAGCGGAGACAGA 

Gene : Tyros 

Segmeiit# ; 3 
Offset : 31 
1st Codon : 1 

NLMEKECCPPWSGDRSPCGQLSGRGSCQNI 
AACCTCATGGAAAAGGAATGCTGTCCCCCTTGGTCCGGCGATAGGTCCCCCTGTGGCCAACTGTCCGGCAGAGGCTCCTGCCAAAACATT 

Gene : Tyros 

Segment# : 4 
Offset t 46 
1st Codon : 1 

SPCGQLSGRGSCQNILLSWAPLGPQFPPTG 
AGCCCTTGCGGACAGCTCAGCGGAAGGGGAAGCTGTCa^GAATATCCTCCTGTCCaU^CGCTCCCCTCGGCCCTCAGTTTCCCTTTACCGGA 

Gene : Tyros 

Segment* : 5 
Offset : 61 
1st Codon : 1 

LLSNAPLGPQFPFTGVDDRESWPSVFYNRT 
CTGCTCAGCAATGCCCCTCTGGGACCCa^TTCCCTTTCAO^GGCGTCGACGATAGGGAAAGCTGGCCCTCCGTGTTTTACAATAG 

Gene : Tyros 

Segment # : 6 
Offset : 76 

1st Codon : 1 

VDDRESWPSVFYNRTCQCSGNFMGFNCGWC 
GTG6ATGACAGAGAGTCCTGGCCTAGCGTCTTCTATAACAGAACCTGTCAGTGTAGCGGAAACTTTATGGGATTCAATTGCGGAAACTGT 

Gene : Tyros 

Segment* : 7 
Offset : 91 
1st Codon : 1 

CQCSGNFMGFNCGNCKFGFWGPNCTERRLL 
TGCCAATGCTCCGGCAATTTCATGGGCTTTAACTGTGGCAATTGCAAATTCGGATTCTGGGGCCCTAACTGTACCGAAAGGAGACTGCTC 

Gene : Tyros 

Segment # : 8 
Offset : 106 

1st Codon : 1 

KFGFWGPNCTERRIiLVRRNl FDLSAPEKDK 
AAGTTTGGCTTTTG6GGACCCAATTGCACAGAGAGAAGGCTCCTGGTCAGGAGAAACATTTTCGATCTGTCCGCCCCTGAGAAAGACAAA 

Gene : Tyros 

Segment* : 9 
Offset : 121 
1st Codon : 1 

VRRNIFDLSAPEKDKFFAYLTLAKHTISSD 
GTGAGAAGGAATATCTTTGACCTCAGCGCTCCCGAAAAGGATAAGTTTTTCGCTTACCTCACCCTCGCCAAACACACAATCTCCAGCGAT 

Gene : Tyros 

Segment* : 10 
Offset : 136 
1st Codon : 1 

FFAYLTIiAKHTISSDYVlpiGTYGQMKNGS 
TTCTTTGCCTATCTGACACTGGCTAAGCATACCATTAGCTCCGACTATGTGATTCCCATTGGCACATACGGACAGATGAAGAATGGCTCC 

Gene : Tyros 

Segment* : 11 
Offset : 151 

1st Codon : 1 

YVIPIGTYGQMKNGSTPMFNDXNIYDLFVW 
TAC6TCATCCCTATCG6AACCTATGGCCAAATGAAAAACGGAAGCACACCCATGTTCAATGACATTAACATTTACGATCTGTTTGTGTGG 

Gene : Tyros 

Segment* : 12 

Offset : 166 

1st Codon : 1 
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TPMPNDINIYDLFVWMHYYVSMDAIiLGGSE 
ACCCCTATGTTTAACGATATCAATATCTATGACCTCTTCGTCTGGATGCACTATTACGTCAGCATGGACGCTCTGCTCGGCGGAAGCGAA 

Gene : Tyros 

Segment # : 13 
Offset : 181 
1st Codon ; 1 

MHYYVSMDALLGGSEIWRDIDFAHEAPAFL 
ATGCATTACTATGTGTCCATGGATGCCCTCCTGGGAGGCTCCGAGATTTGGAGAGACATTGACTTTGCCCATGAGGCTGCCGCTTTCCTC 

Gene : Tyros 

Segment# : 14 
Offset : 196 

1st Codon : 1 

IWRDI DPAHEAPAFLPWHRLPLIiRWEQEIQ 
ATCTGGAGGGATATCGATTTCGCTCACGAAGCCCCTGCCTTTCT6CCTTGGCATAGGCTCTTCCTCCTGAGATGGGAACAGGAAATCCAA 

Gene ; Tyros 

Segment # : 15 
Offset : 211 

1st Codon : 1 

PWHRIiFLLRWEQEIQKLTGDENFTXPYWDW 
CCCTGGOlCAGACTGTTTCTGCTCaGGTGGGAGCAAGAGATTCaLGAAACTGACAGGCGATGAGAATTTCACAATCCC 

Gene : Tyros 

Segn\ent# : 16 
Offset : 226 

1st Codon ; 1 

KLTGDENFTIPYWDWRDAEKCDICTDEYMG 
AAGCTCACCGGAGACGAAT^CTTTACCATTCCCTATTGGGATTGGAGAGACGCTGAGAAATGCGATATCTGTACCGATGAGTATATGGGA 

Gene : Tyros 

Segment # : 17 
Offset : 241 
1st Codon : 1 

RDAEKCD ICTDEYMGGQHPTNPNIjIiSPASP 
AGGGATGCCGAAAAGTGTGACATTTGCACAGACGAATACATGGGCGGACAGCATCCCACAAACCCTAACCTCCTGTCCCCCGCTAGCTTT 

Gene : Tyros 

Segment # : 18 
Offset : 2 56 
1st Codon : 1 

GQHPTNPNLLSPASFFSSWQIVCSRIaEEYKT 
GGCCAACACCCTACCAATCCCAATCTGCTCAGCCCTGCCTCCTTCTTTAGCTCCTGGCAAATCGTCTGCTCCAGGCTCGAGGAATACAAT 

Gene : Tyros , 

Segment # : 19 
Offset : 271 
1st Codon : 1 

FSSWQiVCSRLEEYNSHQSIiCNGTPEGPLR 
TTCTCCAGCTGGCAGATTGTGTGTAGCAGACTGGAA6AGTATAACTCCCACCAAAGCCTCTGCAATGGCAGACCCGAAGGCCCTCTGAGA 

Gene : Tyros 

Segment # : 20 
Offset : 286 

1st Codon : 1 

SHQSLCNGTP EGPLRRNPGNHDKSRTPRLP 
AGCCATCAGTCCCTGTGTAACGGAACCCCTGAGGGACCCCTCAGGAGAAACCCTGGCAATCACGATAAGTCCAGGACACCCAGACTGCCT 

Gene : Tyros 

Segment # : 21 
Offset : 301 

1st Codon : 1 

RNPGNHDKSRTPRIiPSSADVEFCIiSLTQYE 
AGGAATCCCGGAAACCATGACAAAAGCAGAACCCCTAGGCTCCCCTCCAGCGCTGACGTCGAGTTTTGCCTCAGCCTCACCCAATACGAA 

Gene : Tyros 

Segment* : 22 
Offset : 316 
1st Codon : 1 

SSADVEPCLSLTQYESGSMDKAANFSFRNT 
AGCTCCGCCGATGTGGAATTCT6TCTGTCCCTGACACAGTATGAGTCCGGCTCCATGGATAAGGCTGCCAATTTCTCCTTCAGAAACACA 
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Gene : Tyros 

Segment # : 2 3 
Offset : 331 

1st Codon : 1 

sgsmdkaanfsfrntlegpaspltgiadas 

AGCGGAAGCATGGACAAAGCCGCTAACTTTAGCTTTAGGAATACCCTCGAGGGATTCGCTAGCCCTCTGACAGGCATTGCCGATGCCTCC 

Gene : Tyros 

Segments : 24 
Offset : 346 
1st Codon : 1 

LEGPASPIiTGlADASQSSMHNALHIYMNGT 
CTGGAAGGCTTTGCCTCCCCCCTCACCGGAATCGCTGACGCTAGCCAAAGCTCCATGCATAACGCTCTGCATATCTATATGAATGGC^ 

Gene : Tyros 

Segments : 25 
Offset : 361 
1st Codon : 1 

QSSMHNALHlYMNGTMSQVQGSANDPIFLIi 
CAGTCCAGCATGCACAATGCCCTCCACATTTACATGAACGGAACCATGAGCCAAGTGCAAGGCTCCGCCAAT 

Gene : Tyros 

Segments : 26 
Offset : 376 
1st Codon : 1 

MSQVQGSANDPIFLIiHHAFVDSIFEQWLQR 
ATGTCCCAGGTCCAGGGAAGCGCTAACGATCCCATTTTCCTCCTGCATCACGCTTTCGTCGACTCCATCTTTGAGCAATGGCTCCAGAGA 

Gene : Tyros 

Segments : 27 
Offset : 391 
1st Codon : 1 

HHAPVD SIFEQWIiQRHRPLQEVYPEANAPl 
CACCATGCCTTTGTGGATAGCATTTTCGAACAGTGGCTGCAAAGGCATAG6CCTCTGCAAGAGGTCTACCCTGAGGCTAACGCTCCCATT 

Gene : Tyros 

Segments : 2 8 
Offset ; 406 

1st Codon : 1 

HRPLQEVYPEAWAPIGHNRESYMVPFI PLY 
CACAGACCCCTCCAGGAAGTGTATCCCGAAGCCAATGCCCCTATCGGACACAATAGGGAAAGCTATATGGTCCCCTTTATCCCTCTGTAT 

Gene : Tyros 

Segments : 29 
Offset : 421 
1st Codon : 1 

GHNRESYMVPFIPLYRNGDFFIS SKDLGYD 
GGCCATAACAGAGAGTCCTACATGGT6CCTTTCATTCCCCTCTACAGAAACGGAGACTTTTTCATTAGCTCCAAGGATCTGGGATACGAT 

Gene : Tyros 

Segments : 30 
Offset : 436 

1st Codon : 1 

RNGDFFISSKDLGYDYSYLQDSDPDSFQDY 
AGGAATGGCGATTTCTTTATCTCCAGCAAAGACCTCGGCTATGACTATAGCTATCTGCAA6ACTCCGACCCTGACTCCTTCCAAGACTAT 

Gene : Tyros 

Segments : 31 
Offset : 451 

1st Codon : 1 

YSYLQDSDPDSFQDYIKSYLEQASRIWSWL 
TACTCCTACCTCCAGGATAGCGATCCCGATAGCTTTCAGGATTACATTAAGTCCTACCTCGAGCAAGCCTCCAGGATTTGGTCCTGGCTC 

Gene : Tyros 

Segments : 32 
Offset : 466 

1st Codon : 1 ' 

IKSYLEQASRIWSWLL6AAMVGAVLTALLA 
ATCAAAAGCTATCTGGAACAGGCTAGCAGAATCTGGAGCTGGCTGCTCGGCGCTGCCATGGTGGGAGCCGTCCTGACAGCCCl^CCTGGCT 

Gene : Tyros 
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Segment # : 33 
Offset : 481 

1st Codon : 1 

LGAAMVGAVLTALLAGLVSLIiCRHKRKQLP 
CTGGGAGCCGCTATGGTCGGCGCTGTGCTCACCGCTCTGCTCGCCGGACTGGTCAGCCTCCTGTGTAGGCATAAGAGAAAGCAACTGCCT 

Gene : Tyros 

Segments : 34 
Offset : 496 

1st Codon : 1 

GLVSLLCRHKRKQLPEEKQPLL MEKEDYHS 
GGCCTCGTGTCCCTGCTCTGCAGACACAAAAGGAAACAGCTCCCCGAAGAGAAACAGCCTCTGCTCATGGAAAAGGAAGACTATCACTCC 

Gene : Tyros 

Segment # : 35 
Offset : 511 

1st Codon : 1 

EKKQPLLMEKEDYHSLYQSHLAA 
GAGGAAAAGCAACCCCTCCTGATGGAGAAAGAGGATTACCATAGCCTCTACCAAA6CCATCTGGCTGCC 

Gene : TRP2 

Segment # : 1 
Offset : 1 
1st Codon : 1 

AAMSPLWWGFLIiSCLGCKILPGAQGQFPRV 
GCCGCTATGTCCCCCCTCTGGTGGGGCTTTCTGCTCAGCTGTCTGGGATGCAAA?kTCCTCCCCGGAGCCCAAGGCCAATTCCCTAGGGTC 

Gene : TRP2 

Segment # ; 2 
Offset : 16 

1st Codon : 1 

GCKILPGAQ GQFPRVCMTVDSLVNKECCPR 
GGCTGTAAGATTCTGCCTGGCGCTCA6GGACAGTTTCCCAGAGTGTGTATGACAGTGGATAGCCTCGTGAATAAGGAATGCTGTCCCAGA 

Gene : TRP2 

Segment # : 3 
Offset : 31 
1st Codon : 1 

CMTVDSDVNKECCPRLGAESANVCGSQQGR 
TGCATGACCGTCGACTCCCTGGTCAACAAAGAGTGTTGCCCTAGGCTCGGCGCTGAGTCCGCCAATGTGTGT6GCTCCCAGCAAGGCAGA 

Gene : TRP2 

Segment# : 4 
Offset : 46 

1st Codon : 1 

L GAESANVCGSQQGRGQCTEVRADTRPWSG 
CTGGGAGCCGAAAGCGCTAACGTCTGCGGAAGCCAACAGGGAAGGGGACAGTGTACCGAAGTGAGAGCCGATACCAGACCCTGGAGCGGA 

Gene ; TRP2 

Segments : 5 
Offset : 61 
1st Codon : 1 

GQCTEVRADTRPWSGPYILRNQDDREIiWPR 
6GCCAATGCACAGAGGTCAGGGCTGACACAAGGCCTTGGTCCGGCCCTTACATTCTGA6AAACCAAGACGATAGGGAACTGTGGCCCAGA 

Gene : TRP2 

Segment# : 6 
Offset : 76 

1st Codon : 1 

PYI IiRNQDDRELWPRKFFHRTCKCTGNFAG 
CCCTATATCCTCAGGAATCAGGAT6ACAGAGAGCTCTGGCCTAG6AAATTCTTTCACAGAACCTGTAAGT6TACCGGAAACTTTGCCGGA 

Gene : TRP2 

Segment # : 7 
Offset : 91 

1st Codon : 1 

KFFHRTCKCTGNFAGYNCGDCKFGWTGPNC 
AAGTTTTTCCATAGGACATGCAAATGCACAGGCAATTTCGCTGGCTATAACTGTGGCGATTGCAAATTCGGATGGACAGGCCCTAACTGT 

Gene : TRP2 

Segments : 8 
Offset : 106 
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Ist Codon : 1 

YNCGDCKFGWTGPNCERKKPPVIRQNIHSL 
TACAATTGCGGAGACTGTAAGTTTGGCTGGACCGGACCC^TTGCGAAAGGAAAAAGCCTCCCGTCATCAGACAGAATATCCATAGCCTC 

Gene : TRP2 

Segment # : 9 
Offset : 121 

1st Codon : 1 

ERKKPPVIRQNIHSLSPQEREQFLGAIiDLA 
GAGAGAAAGAAACCCCCTGTGATTAGGCAAAACATTCACTCCCTGTCCCCCCAAGAGA6AGAGCAATTCCTCGGCGCTCTGGATCTGGCT 

Gene : TRP2 

Segment # : 10 
Offset : 136 
1st Codon : 1 

S PQEREQFIiGALDLAKKRVHPDYVl TTQHW 
AGCCCTCAGGAAAGGGAACAGTTTCTGGGAGCCCTCGACCTCGCCAAAAAGAGAGTGCATCCCGATTACGTCATCACAACCCAACACTGG 

Gene : TRP2 

Segment # : 11 
Offset : 151 
1st Codon : 1 

KKRVHPDYVITTQHWLGLLGPNGTQPQPAN 
AAGAAAAGGGTCCACCCTGACTATGTGATTACCACACAGCATTGGCTCGGCCTCCTGGGACCCAATGGCACACAGCCTCAGTTTGCCAAT 

Gene : TRP2 

Segment# : 12 
Offset : 166 

1st Codon : 1 

IiGIiLGPNGTQPQPANCSVYDFFVWIiHYYSV 
CTGGGACTGCTCGGCCCTAACGGAACCCAACCCCAATTCGCTAACTGTAGCGTCTACGATTTCTTTGTGTGGCTGCATTACTATAGCGTC 

Gene : TRP2 

Segments : 13 
Offset : 181 
1st Codon ; 1 

CSVYDFFVWIiHYYSVRDTIiIiGPGRPYRAID 
TGCTCCGTGTAT6ACTTTTTCGTCTGGCTCCACTATTACTCCGTGAGAGACACACTGCTCGGCCCTGGCAGACCCTATAGGGCTATCGAT 

Gene : TRP2 

Segments : 14 
Offset : X96 
1st Codon : 1 

RDTLLGPGRPYRAIDFSHQGPAFVTWHRYH 
AGGGATACCCTCCTGGGACCCGGAAGGCCTTACAGAGCCATTGACTTTAGCCATCAGGGACCCGCTTTCGTCACCTGGCACAGATACCAT 

Gene : TRP2 

Segments : 15 
Offset : 211 

1st Codon ; 1 

FSHQGPAPVTWHRYHLIiCLERDIiQRIilGNE 
TTCTCCCACCAAGGCCCTGCCTTTGTGACATGGCATAGGTATCACCTCCTGTGTCTGGAAAGGGATCTGCAAAGGCTCATCGGAAACGAA 

Gene : TRP2 

Segments : 16 
Offset : 226 
1st Codon : 1 

LLCLERDLQRLIGNESFALPYWNFATGRNE 
CTGCTCTGCCTCGAGAGAGACCTCCAGAGACTGATTGGCAATGAGTCCTTCGCTCTGCCTTACTGGAACTTTGCCACAGGCAGAAACGAA 

Gene : TRP2 

Segments : 17 
Offset : 241 

1st Codon : 1 

S FALPYWNFATGRNECDVCTDQLFGAARPD 
AGCTTTGCCCTCCCCTATTGGAATTTCGCTACCGGAAGGAATGAGTGTGACGTCTGCACAGACCAACTGTTTGGCGCTGCCAGACCCGAT 

Gene : TRP2 

Segments : 18 
Offset : 256 
1st Codon : 1 

CDVCTDQLFGAARPDDPTLISRNSRPSSWE 
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TGOSATGTGTGTACCGATCAGCTCTTCGGAGCCGCTAGGCCTGACGATCCCACACTGATTAGCAGAAACTCCAGGTTTAGCTCC 

Gene : TRP2 

Segment# : 19 
Offset : 271 

1st Codon : 1 

DPTLI SRNSRFSSWETVCDSLDDYNHLVTIi 
GACCCTACCCTCATCTCCAGGAATAGCAGATTCTCCaiGCTGGGAGACyVGTGTGTGACTCCCTGGATGACTATAACCATCTGGTCACCCTC 

Gene : TRP2 

Segment# : 20 
Offset : 286 
1st Codon : 1 

TVCDSLDDYNHLVTnCNGTYEGriLRRNQMG 
ACCGTCTGCGATAGCCTCGACGATTACAATCACCTCGTGACACTGTGTAACGGAACCTATGAGGGACTGCTCAGGA6AAACCAAATGGGA 

Gene : TRP2 

Segment # : 21 
Offset : 301 
1st Codon : 1 

CNGTYEG LLRRNQMGRNSMKLPTLKDIRDC 
TGCAATGGCACATACGAAGGCCTCCTGA6AAGGAATCAGATGGGCAGAAACTCCATGAAACTGCCTACCCTCAAGGATATCAGAGACTGT 

Gene : TRP2 

Segment # : 22 
Offset : 316 

1st Codon : 1 

RNSMKL PTLKDIRDCIiSLQKFDNPPFFQNS 
AGGAATAGCATGAAGCTCCCCACACTGAAAGACATTAGGGATTGCCTCAGCCTCCAGAAATTCGATAACCCTCCCTTTTTCCAAAACTCC 

Gene : TRP2 

Segment # : 23 
Offset : 331 
1st Codon : 1 

LSLQKFDNPPPPQNSTFSFRN-ALBGFDKAD 
CTGTCCCTGCAAAAGTTTGACAATCCCCCTTTCTTTCAGAATAGCACATTCTCCTTCAGAAACGCTCTGGAAGGCTTTGAC2^AAGCCGAT 

. Gene : TRP2 

Segments : 24 
Offset : 346 

1st Codon ; 1 

TFSFRNALEGFDKADGTLDSQVMSIiHNLVH 
ACCTTTAGCTTTAGGAATGCCCTCGAGGGATTCGATAAGGCTGACGGAACCCTCGACTCCCAGGTCATGTCCCTGCATAACCTCGTGCAT 

Gene : TRP2 

Segment # : 25 
Offset : 361 
1st Codon : 1 

GTLDSQVMSLHNIiVHSFLNGTNAIiPHSAAN 
GGCACACTGGATAGCCAAGTGATGAGCCTCCACAATCTGGTCCACTCCTTCCTC^^CGGAACCAATGCCCTQCCCCATAGCGCTGCCAAT 

Gene : TRP2 

Segment # : 26 
Offset : 376 

1st Codon : 1 

S PLNGTNAIiPHSAAKrDPIFVVLHSFTDAIP 
AGCTTTCTGAATGGCACAAACGCTCTGCCTCACTCCGCCGCTAACGATCCCATTTTCGTCGTGCTCCACTCCTTCACAGACGCTATCTTT 

Gene : TRP2 

Segment # : 27 
Offset : 391 
1st Codon : 1 

DP I FVVIjHSFTDAI fdewmkrfnppadawp 
GACCCTATCTTTGTGGTCCTGCATAGCTTTACCGATGCCATTTTCGATGAGTGGATGAAAAGGTTTAACCCTCCCGCTGACGCTTGGCCT 

Gene : TRP2 

Segment # : 2 8 
Offset : 406 
1st Codon : 1 

DEWMKRPNPPADAWPQELAPIGHNRMYNMV 
GACGAATGGATGAAGAGATTCAATCCCCCTGCCGATGCCTGGCCCCAA6AGCTCGCCCCTATCGGACACAATAGGATGTACAATATGGTC 
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Gene : TRP2 

Segment # : 29 

Offset : 421 

1st Codon ; 1 

QEIiAPIGHNRMYNMVPP FPPVTNE ELFLTS 
CAGGAACTGGCTCCmTTGGCraTAACAGAATGTATAACATGGTGCCTTTCTTTCCCCCTGTGACTiAACGAAGAGCTCT^ 

Gene : TRP2 

Segment # : 3 0 
Offset : 436 
1st Codon : 1 

PFFPPVTNEELFLTSDQLGYSYAIDLPVSV 
CCCTTTTTCCCTCCCGTCACCAATGAGGAACTGTTTCTGACAAGCGATCAGCTCGGCTATAGCTATGCCATTGACCTCCCCGTCAGCGTC 

Gene : TRP2 

Segment # : 31 
Offset : 451 

1st Codon : 1 . ' 

DQLGYSYAIDIiPVSVEETPGWPTTIiLVVMG 
GACCAACTGGGATACTCCTACGCTATCGATCTGCCTGTGTCCGTGGAAGAGACACCCGGATGGCCTACCACACTGCTCGTGGTCATGGGA 

Gene : TRP2 

Segment # : 32 
Offset : 466 
1st Codon : 1 

EETPGWPTTLIiVVMGTLVALtVGLFVLLAFL 
GAGGAAACCCCTGGCTGGCCCACAACCCTCCTGGTCGTGATGGGCACACTGGTCGCCCTCGTGGGACTGTTTGTGCTCCTGGCTTTCCTC 

Gene : TRP2 

Segment # : 33 
Offset : 481 

1st Codon : 1 

TIiVALVGIiFVLIiAFIiQYRRLRKGYTPLMET 
ACCCTCGTGGCTCTGGTCGGCCTCTTCGTCCTGCTCGCCTTTCTGCAATACAGAAGGCTCAGGAAAGGCTATACCCCTCTGATGGAGACA 

Gene : TRP2 

Segments : 34 
Offset : 496 

1st Codon : 1 

QYRRLRKGYTPLMETHLSSKRYTEEAAA 
CAGTATAGGAGACTGAGAAAGGGATACACACCCCTCAT6GAAACCCATCTGTCCAGCAAAAGGTATACCGAAGAGGCTGCCGCT 

Gene ; MCIR 

Segment # : 1 
Offset : 1 
1st Codon : 1 

AAMAVQGSQRRLliGSIiNSTPTAlPQLGLAA 
GCCGCTATGGCTGTGCAAGGCTCCCAGAGAAGGCTCCTGGGAAGCCTCAACTCCACCCCTACCGCTATCCCTCAGCTCGGCCTCGCCGCT 

Gene : MCIR 

Segment # : 2 
Offset : 16 
1st Codon : 1 

IiNSTPTAI PQIiGLAANQTGARCLEVSISDG 
CTGAATAGCACACCCACAGCCATTCCCCAACTGGGACTGGCTGCCAATCAGACAGGCGCTAGGTGTCTGGAAGTGTCCATCTCCGACGGA 

Gene : MCIR 

Segment# : 3 
Offset : 31 

1st Codon : 1 

NQTGARCLEVS I SDCLFLSLGLVS LVENAL 
AACCAAACCGGAGCCAGATGCCTCGAGGTCAGCATTAGCGATGGCCTCTTCCTCAGCCTCGGCCTCGTGTCCCTGGTCGAGAATGCCCTC 

Gene : MCIR 

Segment# : 4 
Offset : 46 

1st Codon : 1 

LFIiSLGLVSIiVENAIiVVATIAKNRNLHS PM 
CTGTTTCT6TCCCTGGGACTGGTCAGCCTCGTGGAAAAC6CTCTGGTCGTGGCTACCATTGCCAAAAACAGAAACCTCCACTCCCCCATG 

Gene : MCIR 

Segments : 5 
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Offset : 61 
1st Codon : 1 

VVATIAKNRNLHSPMYCFICCLALSDLIiVS 
GTGGTCGCCAC:M.TCGCTAAGAATAGGAATCTGmTAGCCCTATGTATT6CTTTATCTGTTGCCTCGCCCTmGCGATCTGC^ 

Gene : MCIR 

Segment # : 6 
Offset : 76 
1st Codon : 1 

YCFICCIj.ALSDIiliVSGTNVLETAVILLIjEA 
TACTGTTTCATTTGCTGTCTGGCTCTGTCCGACCTCCTGGTCAGCGGAACCAATGTGCTCGAGACAGCCGTCATCCTCCTGCTCGAGGCT 

Gene : MCIR 

Segment # : 7 
Offset : 91 
1st Codon : 1 

GTNVLETAVI LLLEAGAIiVARAAVLQQIiDN 
GGCACAAAC6TCCTGGAAACCGCTGTGATTCTGCTCCTGGAAGCCGGAGCCCTCGTGGCTAGGGCTGCCGTCCTGCAACAGCTCGACAAT 

Gene : MCIR 

Segments : 8 
Offset : 106 
1st Codon : 1 

GALVARAAVLQQLDNVIDVI TCS SMLSSLC 
GGCGCTCTGGTCGCCAGAGCCGCTGTGCTCCAGCAACTGGATAACGTCATCGATGTGATTACCTGTAGCTCCATGCTCAGCTCCCTGTGT 

Gene : MCIR 

Segment # : 9 
Offset ; 121 

1st Codon : 1 

VIDVITCSSMLSSLCFLGAXAVDRYISIFY 
GTGATTGACGTCATCACATGCTCCAGCATGCT6TCCAGCCTCTGCTTTCTGGGAGCCATTGCCGTCGACAGATACATTAGCATTTTCTAT 

Gene : MCIR 

Segment # : 10 
Offset : 136 
1st Codon : 1 

FLGAIAVDRYI SIFYALRYHS IVTLPRAP R 
TTCCTCGGCGCTATCGCTGTGGATAGGTATATCTCCATCTTTTACGCTCTGAGATACCATAGCATTGTGACACT6CCTAGGGCTCCCAGA 

Gene : MCIR 

Segments : 11 
Offset : 151 
1st Codon : 1 

ALRYHS I VTLPRAPRAVAAIWVASVVFSTL 
6CCCTCAGGTATCACTCCATCGTCACCCTCCCCAGAGCCCCTAGGGCTGTGGCTGCCATTTGGGTCGCCTCCGTGGTCTTCTCCACCCTC 

Gene : MCIR 

Segments : 12 
Offset : 166 
1st Codon : 1 

AVAAIWVASVVFSTLPIAYYDHVAVLLCLV 
6CCGTCGCCGCTATCTGGGTGGCTAGCGTCGTGTTTAGCACACTGTTTATCGCTTACTATGACCATGTGGCTGTGCTCCTGTGTCTGGTC 

Gene : MCIR 

Segments : 13 
Offset : 181 
1st Codon : 1 

FIAYYDHVAVLLCLVVFFLAMIjVLMAVIiYV 
TTCATTGCCTATTACGATCACGTCGCCGTCCTGCTCTGCCTCGTGGTCTTCTTTCTGGCTATGCTCGTGCTCATGGCTGTGCTCTACGTC 

Gene : MCIR 

Segments : 14 
Offset : 196 
1st Codon : 1 

VFFIiAMLVLMAVLYVHMLARACQHAQGIAR 
GTGTTTTTCCTCGCCATGCTGGTCCTGATGGCCGTCCTGTATGTGCATATGCTCGCCAGA6CCTGTCAGCATGCCCAAGGCATTGCCAGA 

Gene : MCIR 

Segments : 15 

Offset : 211 

1st Codon : 1 
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HMLARACQHAQGIARIiHKRQRPVHQGFG iLK 
CACATGCTGGCTAGGGCTTGCCAACACGCTCA6GGAATCGCTAGGCTCCACAAAAGGCAAAGGCCTGTGCATCAGGGATTCGGACTGAAA 

Gene ; MCIR 

Segment # : 16 
Offset : 226 

Ist Codon : 1 

LHKRQRPVHQGFGLKGAVTLTIIiLGIFFLC 
CTGCATAAGAGACAGAGACCCGTCCACCAAGGCTTTGGCCTCAAGGGAGCCGTCACCCTCACCATTCTGCTCGGCATTTTCTTTCTGTGT 

Gene : MCIR 

Segment # : 17 
Offset : 241 
1st Codon : 1 

GAVTIiTIIiLGlPFLCWGPFFLHLTLIVLCP 
GGCGCTGTGACACTGACAATCCTCCTGGGAATCTTTTTCCTCTGCTGGGQCCCTTTCTTTCTGCATCTGACACTGATTGTGCTCTGCCCT 

Gene : MCIR 

Segments : 18 
Offset : 256 
1st Codon : 1 

WGPPFLHLTLIVIjCPEHPTCG CIFKKFNIjF 
TGGG6ACCCTTTTTCCTCCACCTCACCCTCATCGTCCTGTGTCCCGAACACCCTACCTGTGGCTGTATCTTTAAGAATTTCAATCTGTTT 

Gene : MCIR 

Segment # : 19 
Offset : 271 

1st Codon : 1 

EHPTCGCIFKMFNLFLALIICNAIIDPLIY 
GAGCATCCCACATGCGGATGCATTTTCAAAAACTTTAACCTCTTCCTCGCCCTCATCATTTGCAATGCCATTATCGATCCCCTCATCTAT 

Gene : MCIR 

Segments : 20 
Offset : 286 
1st Codon : 1 

LAHjIICKrAIIDPLIYAFHSQEIiRRTLKEVL 
CTGGCTCTGATTATCTGTAACGCTATCATTGACCCTCTGATTTACGCTTTCCATAGCCAAGAGCTCAGGAGAACCCTCAAGGAAGTGCTC 

Gene : MCIR 

Segments : 21 
Offset : 301 
1st Codon ; 1 

AFHSQELRRTIiKEVIiTCSWAA 
GCCTTTCACTCCCAGGAACTGAGAAGGACACTGAAAGAGGTCCTGACATGCTCCTGGGCTGCC 

Gene : MUCIF 

Segments : 1 
Offset : 1 
1st Codon : 1 

AAMTPGTQS PPFLIiLIiLTVLTVVTGSGHAS 
GCCGCTATGACACCCGGAACCCAAAGCCCTTTCTTTCTGCTCCTGCTCCTGACAGTGCTCACCGTCGTGACAGGCTCCGGCCATGCCTCC 

Gene : MUCIP 

Segments 2 
Offset : 16 
1st Codon : 1 

LLTVLT VVTGSGHASSTPGGE KETSATQRS 
CTGCTCACCGTCCTGACAGTGGTCACCGGAAGCGGACACGCTAGCTCCACCCCTGGCGGAGAGAAAGAGACAAGCGCTACCCAAAGGTCC 

Gene : MUCIF 

Segments : 3 
Offset : 31 

1st Codon : 1 

STPGGEKETSATQRSSVPSSTEKNAVSMTS 
AGCACACCCGGAGGCGAAAAGGAAACCTCCGCCACACAGAGAAGCTCCGTGCCTAGCTCCACCGAAAAGAATGCC6TCAGCATGACCTCC 

Gene : MUCIF 

Segments : 4 
Offset • : 46 
1st Codon : 1 

SVPSSTEKNAVSMTSSVLSSHS PGSGSSTT 
AGCGTCCCCTCCAGCACAGAGAAAAACGCTGTGTCCATGACAAGCTCCGTGCTCAGCTCCCACTCCCCCGGAAGCGGAAGCTCCACCACA 
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Gene : MUCIP 

Segment# : 5 
Offset : 6X 

1st Codon : 1 

SVIjS SHS PGSGSSTTQGQDVTLAPATEPAS 
AGCGTCCTGTCCAGCCATAGCCCTGGCTCCGGCTCCAGCACAACCCAAGGCCAAGACGTCACCCTCGCCCCTGCCACAGAGCCTGCCTCC 

Gene : MUCIF 

Segment # : 6 
Offset : 76 

1st Codon ; 1 

QGQDVT IiAPATEPASGSAATWGQDVTSVPV 
CAGGGACAGGATGTGACACTGGCTCCCGCTACCGAACCCGCTAGCGGAAGCGCTGCCACATGGG6ACAGGATGTGACAA6CGTCCCCGTC 

Gene : MUCIF 

Segment # : 7 
Offset : 31 
1st Codon : 1 

GSAATWGQDVTSVPVTRPALGSTTPPAHDV 
GGCTCCGCCGCTACCTGGGGCCAAGACGTCACCTCCGTGCCTGTGACAAGGCCTGCCCTCGGCTCCACCACACCCCCTGCCCATGACGTC 

Gene : MUCIF 

Seginent# : 8 
Offset : 106 
1st Codon : 1 

TRPAIjGS TTPPAHDVTSAPDNKAA 
ACCAGACCCGCTCTGGGAAGCACAACCCCTCCCGCTCACGATGTGACAAGCGCTCCCGATAACAAAGCCGCT 

Gene : MUCIR 

Segment # : 1 
Offset : 1 
1st Codon : 1 

AANRPALGSTAPPVHNVTSASGSASGSAST 
GCCGCTAACAGACCCGCTCTGGGAAGCACAGCCCCTCCCGTCCACAATGTGACAAGCGCTAGCGGAAGCGCTAGCGGAAGCGCTAGCACA 

Gene : MUCIR 

Segment# : 2 
Offset : 16 
1st Codon : 1 

NVTSASG SASGSASTLVHNGTSARATTTPA 
AACGTCACCTCCGCCTCCGGCTCCGCCTCCGGCTCCGCCTCCACCCTCGTGCATAACGGAACCTCCGCCAGAGCCACAACCACACCCGCT 

Gene : MUCIR 

Segment # : 3 
Offset : 31 

1st Codon : 1 

LVHNGTSARATTTPASKSTPFSIPSHHSDT 
CTGGTCCACAATGGCACAAGCGCTAGGGCTACCACAACCCCTGCCTCCAAGTCCACCCCTTTCTCCATCCCTAGCCATCACTCCGACACA 

Gene : MUCIR 

Segment^ : 4 
Offset : 46 
1st Codon : 1 

SKSTPFSIPSHHSDTPTTLASHSTK TDASS 
AGCAAAAGCACACCCTTTAGCATTCCCTCCCACCATAGCGATACCCCTACCACACTGGCTAGCCATAGCACAAAGACAGACGCTAGCTCC 

Gene : MUCIR 

Segments : 5 
Offset : 61 
1st Codon : 1 

PTTIjASHSTKTDASSTHHSSVPPLTSSNHS 
CCCACAACCCTCGCCTCCCACTCCACCAAAACCGATGCCTCCAGCACACACCATAGCTCCGTGCCTCCCCTCACCTCCAGCAATCACTCC 

Gene : MUCIR 

Segments : 6 
Offset ; 76 

1st Codon : 1 

THHSSVPPLTSSNHSTSPQLSTGVSFFFLS 
ACCCATCACTCCAGCGTCCCCCCTCTGACAAGCTCCAACCATAGCACAAGCCCTCAGGTCAGCACAGGCGTCAGCTTTTTCTTTCTGTCC 

Gene : MUCIR 
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Segment # : 7 
Offset : 91 
1st Codon : 1 

TSPQIiSTGVSFFPLSPHlSNIiQPKSSLEDP 
ACCTCCCCCCJ^CTGTCCACCGGAGTGTCCTTCTTTTTCCTC7iGCTTTC:7VCATTAGCM.TCTGCAATTCAATAGCTCCCTGGi^ 

Gene : MUCIR 

Segment # : 8 
Offset : 106 
1st Codon : 1 

PHISNBQPNSSIiEDPSTDYYQEIiQRDISEM 
TTCCATATCTCCAACCTCCAGTTTAACTCmGCCTCGAGGATCCCTCCACCGATTACTATCAGGAACTGCyiiA^ 

Gene : MUCIR 

Segment # : 9 
Offset : 121 
1st Codon ; 1 

STDYYQEIiQRDISEMFIiQIYKQGGPLGLSN 
AGCACAGACTATTACCAAGAGCTCCAGAGAGACATTAGCGAAATGTTTCTGCAAATCTATAAGCAAGGCGGATTCCTCGGCCTCAGCAAT 

Gene : MUCIR 

Segment # : 10 
Offset : 136 

1st Codon : 1 

FLQIYKQGGFLGLSNIKFRPGSVVVQLTLA 
TTCCTCCAGATTTACAAACAGGGAGGCTTTCTGGGACTGTCCAACATTAAGTTTAGGCCTGGCTCCGTGGTCGTGCAACTGACACTGGCT 

Gene : MUCIR 

Segment# ; 11 
Offset : 151 

1st Codon : 1 

IKFRPGSVVVQLTIiAFREGTINVHDVETQP 
ATCAAATTCAGACCCGGAAGCGTCGTGGTCCAGCTCACCCTCGCCTTTAGG6AAGGCACAATCAATGTGCATGACGTCGAGACACAGTTT 

Gene : MUCIR 

Segment # ; 12 
Offset : 166 
1st Codon : 1 

PREGT INVHDVETQFWQYKTEAASRYNIiTI 
TTCAGAGAGGGAACCATTAACGTCCACGATGTGGAAACCCAATTCAATCAGTATAAGACAGAGGCTGCCTCCAGGTATAACCTCACC^ 

Gene : MUCIR 

Segment # : 13 
Offset : 181 

1st Codon : 1 

NQYKTEAASRYNLTI SDVSVSDVPFPFSAQ 
AACCAATACAAAACCGAAGCCGCTAGCAGATACAATCTGACAATCTCCGACGTCAGCGTCAGCGATGTGCCTTTCCCTTTCTCCGCCCAA 

Gene : MUCIR 

Segment# : 14 
Offset : 196 
1st Codon : 1 

SDVSVSDVPFPFSAQSGAGVPGWGIALIiVIi 
AGCGATGTGTCCGTGTCCGACGTCCCCTTTCCCTTTAGCGCTCAGTCCGGCGCTGGCGTCCCCGGATGGGGAATCGCTCTGCTCGTGCTC 

Gene : MUCIR 

Segment# : 15 
Offset : 211 
1st Codon : 1 

SGAGVPGWGIALLVLVCVLVALAXVYLIAL 
AGCGGAGCCGGAGTGCCTGGCTGGGGCATTGCCCTCCTGGTCCTGGTCTGCGTCCTGGTCGCCCTCGCCATTGTGTATCTGATTGCCCTC 

Gene : MUCIR 

Segment# : 16 
Offset : 226 

1st Codon : 1 

VCVLVAIiAIVYLIALAVCQCRRK.NYGQIiDI 
GTGTGTGTGCTCGTGGCTCTGGCTATCGTCTACCTCATCGCTCTGGCTGTGTGTCAGTGTAGGAGAAAGAATTACGGACAGCTC6ACATT 

Gene : MUCIR 

Segment # : 17 
Offset : 241 
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1st Codon : 1 

AVCQCRRKNYGQLDIFPARDTYHPMSEYPT 
GCCGTCTGCCAATGCAGAAGGAAAAACTATGGCCaiACTGGATATCTTTCCCGCTAGGGATACOTATCACC 

Gene : MUCIR 

Segnient# : 18 
Offset : 256 

1st Codon : 1 

FPARDTYHPMSEYPTYHTHGRYVPPSSTDR 
TTCCCTGCmGAGACACATACCATCCmTGAGCGAATACCCTACCTATCACACACACGGAAGGTATGTGCCTCCCTCCAGCAC^ 

Gene ; MUCIR 

Segment# : 19 
Offset : 271 

1st Codon : 1 

YHTHGRYVPPSSTDRSPYEKVSAGNGGS SL 
TACCATACCCATGGCA6ATACGTCCCCCCTAGCTCCACCGATAGGTCCCCCTATGAGAAAGTGTCCGCCGGAAACGGAGGCTCCAGCCTC 

Gene : MUCIR 

Segmeiit# : 20 
Offset : 286 
1st Codon : 1 

SPYEKVSAGNGGSSLSYTNPAVAAAS ANLA 
AGCCCTTACGAAAAGGTCAGCGCTGGCAATGGCGGAAGCTCCCTGTCCTACACAAACCCTGCCGTCGCCGCTGCCTCCGCCAATCTGGCT 

Gene : MUCIR 

Segment # : 21 
Offset : 301 
1st Codon : 1 

SYTNPAVAAASANLAA 
AGCTATACCAATCCCGCTGTGGCTGCCGCTAGCGCTAACCTC6CCGCT 

Segments in scrambled order: 



gplOO #4 

WNRQLYPEWTEAQRLDCWRGGQVSIiKVSND 
TGGAATAGGCAACTGTATCCCGAATGGACAGAGGCTCAGAGACTGGATTGCTGGAGGGGAGGCCAAGTGTCCCTGAAAGTGTCCAACGAT 

TRP2 #6 

PYILRNQDDRELWPRKFFHRTCKCTGNPAG 
CCCTATATCCTCAGGAATCAGGATQACAGAGAGCTCTGGCCTAGGAAATTCTTTCACAGAACCTGTAAGTGTACCGGAAACTTTGCCGGA 

Tyros #3 0 

RNGDF FI S SKDXjGYDYSYLQDSDPDSFQDY 
AGGAATGGCGATTTCTTTATCTCCAGCAAAGACCTCGGCTATGACTATAGCTATCTGCAAGACTCCGACCCTGACTCCTTCCAAGACTAT 

TRP-1 #1 

AAPAFliTWHRYHLLRLEKDMQEMLQEPS FS 
GCCGCTCCCGCTTTCCTCACCTGGCACAGATACCATCTGCTCAGGCTCGAGAAAGACATGCAGGAAATGCTCCAGGAACCCTCCTTCTCC 

Tyros #29 

GHNRE SYMVPFIPLYRNGDFFISSKDLGYD 
GGCCATAACAGAGAGTCCTACATGGTGCCTTTCATTCCCCTCTACAGAAACGGAGACTTTTTCATTAGCTCCAAGGATCTGGGATACGAT 

TRP2 #16 

IiLCLE RDIiQRLIGNESFALPYWNFATGRNE 
CT6CTCTGCCTCGAGAGAGACCTCCAGAGACTGATTGGCAATGAGTCCTTCGCTCTGCCTTACTGGAACTTTGCCACAGGCAGAAACGAA 

gplOO #23 

TTEVVGTTPGQAPTAEPSGTTSVQVPTTEV 
ACCACAGAGGTCGTGGGAACCACACCCGGACAGGCTCCCACAGCCGAACCCTCCGGCACAACCTCCGTGCAAGTGCCTACCACAGAGGTC 

MUCIR #9 

STDYYQELQRDISEMFLQIYKQGGFLGLSN 
AGCACAGACTATTACCAAGAGCTCCAGAGAGACATTAGCGAAATGTTTCTGCAAATCTATAAGCAAGGCGGATTCCTCGGCCTCAGCTiAT 

gplOO #36 

ACMEI s SPGCQPPAQRXjCQPVLPSPACQLV 
GCCTGTATGGAAATCTCCAGCCCTGGCTGTCAGCCTCCCGCTCAGAGACTGTGTCAGCCTGTGCTCCCCTCCCCCGCTTGCCAACTGGTC 

TRP2 #31 

DQLGYSYAIDIiPV SVEET PGWPTTLLVVMG 
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GACCAACTGGGATACTCCTACGCTATCGATCTGCCTGTGTCCGTGGAAGAGACACCCGGATGGCCTACCACaiCTGCTCGTGGTCATC^ 
TRP-1 #7 

TEDGPIRRNPAGNVARPMVQRLPEPQDVAQ 
ACCGAAGACGGACCCATTAGGAGAAACCCTGCCGGAAACGTCGCCAGACCCATGGTGCAAAGGCTCCCCGAACCCCAAGACGTCGCCC^ 

TRP2 #3 

CMTVDSIiVNKECCPRLGAESANVCGSQQGR 
TGCATGACCGTCGACTCCCTGGTCAACAAAGAGTGTT6CCCTAGGCTCGGCGCTGAGTCC6CCAATGTGTGTGGCTCCCAGCAAGGCAGA 

MUCIR #13 

NQYKTEAASRYNLTISDVSVSDVPPPPSAQ 
AACCAATACAAAACCGAAGCCGCTAGCAGATACAATCTGACAATCTCCGACGTCAGCGTCAGCGATGTGCCTTTCCCTTTCTCCGCCCAA 

TRP2 #1 

AAMSPIiWWGFIjIjSCIjGCKILPGAQGQFPRV 
GCCGCTATGTCCCCCCTCTGGTGGGGCTTTCTGCTCAGCTGTCTGGGATGCAAAATCCTCCCCGGAGCCCAAGGCCAATTCCCTAGGGTC 

gplOO #18 

ADLSYTWDFGDSSGTLISRAIiVVTHTYLEP 
GCCGATCTGTCCTACACATGGGATTTCGGAGACTCCAGCGGAACCCTCATCTCCAGGGCTCTGGTCGTGACACACACATACCTCGAGCCT 

gplOO #27 

IiAEMSTPEATGMTPAEVSIVVLSGTTAAQV 
CTGGCTGAGATGAGCAmCCCGAAGCCACAGGCyiTGACCCCTGCCGAAGTGTCCATCGTCGTGCTCAGCGGAACCaCAGCCGCTCAGG 

MQCIR #11 

IKFRPGSVVVQLTLAPREGTINVHDVETQF 
ATCAAATTCAGACCCGGAAGCGTCGTGGTCCAGCTCACCCTCGCCTTTAGGGAAGGCACaATCT^TGTGmTGACGTCGAGACACA 

MUCIF #7 

GSAATWGQDVTSVPVT. RPALGSTTPPAHDV 
GGCTCCGCCGCTACCTGGGGCCMGACGTCACCTCCGTGCCTGTGACaAGGCCTGCCCTCGGCTCCaCCACACCCCCTGCCmTG 

MCIR #16 

LHKRQRPVHQGFGLKGAVTIiTIIiLGIFFIiC 
CTGCATAAGAGAO^GAGACCCGTCa^CCTVAGGCTTTGGCCTCAAGGGAGCCGTCACCCTCACCATTCTGCTCGGCATa^ 

MCIR #20 

IiALIICNAXIDPLIYAFHSQELRRTLKEVIi 
CTGGCTCTGATTATCTGTAACGCTATCATTGACCCTCTGATTTACGCTTTCCATAGCCAAGAGCTCAGGAGAACCCTCaiAGGAA 

TRP2 #7 

KFPHRTCKCTGNFAGYNCGDCKFGWT6PNC 
AAGTTTTTCCATAGGAmTGCAAATGCACaGGCaATTTCGCTGGCTATAACTGTGGCGATTGCAAATTCGGATGGACa^G 

TRP2 #23 

liSIiQKFDNPPFFQNSTFSPRNALEGFDKAD 
CTGTCCCTGCAAAAGTTTGACT^TCCCCCTTTCTTTCAGAATAGCACATTCTCCTTCAGAAACGCTCTGGAAGGCTTTGACAAAGCCGA 

MUCIR #4 

SKSTP FSIPSHHSDTPTTIiASHSTKTDASS 
AGCAAAAGCACACCCTTTAGCATTCCCTCCCACCATAGCGATACCCCTACCACACTGGCTAGCCATAGCACAAAGACAGACGCTAGCTCC 

MUCIR #1 

AANRPALGSTAPPVHNVTSASGSASGSAST 
GCCGCTAACAGACCCGCTCTGGGAAGCACAGCCCCTCCCGTCCACAATGTGACAAGCGCTAGCGGAAGCGCTAGCGGAAGCGCTAGCACA 

TRP2 #21 

CNGTYEGIiLRRNQMGRNSMKLPTIiKDIRDC 
TGCAATGGCACATACGAAGGCCTCCTGAGAAGGAATCAGATGGGCAGAAACTCCATGAAACTGCCTACCCTCAAGGATATCAGAGACTGT 

MUCIR #6 

THHSSVPPLTSSNHSTSPQLSTGVS FFPLS 
ACCCATCACTCCAGCGTCCCCCCTCTGACAAGCTCCAACCATAGCACAAGCCCTCAGCTCAGCACAGGCGTCA6CTTTTTCTTTCTGTCC 

MCIR #13 

FIAYYDHVAVLIiCIiVVFFLAMIjVIjMAVIjyv 
TTCATTGCCTATTACGATCACGTCGCCGTCCTGCTCTGCCTCGTGGTCTTCTTTCTGGCTATGCTCGTGCTCATGGCTGTGCTCTACGTC 

Tyros #16 

kltgdenftipywdwrdaekcdictdeymg 
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AAGCTCACCGGAGACGAAAACTTTACCATTCCCTATTGGGATTGGAGAGACGCTGAGAAATGCGATATCTGTACCGATGAGTATATGGGA 
gplOO #32 

LRLVKRQVPLDCVLYRYGSFSVTLDIVQGI 
CTGAGACTGGTCAAGAGAmGGTCCCCCTCGACTGTGTGCTCTACAGATACGGAAGCTTTAGCGTCACCCTCGACATTGTGC:^ 

MUCIR #10 

FLQIYKQGGFLGLSNIKPRPGSVVVQLTLA 
TTCCTCCAGATTTACAAACAGGGAGGCTTTCT6GGACTGTCCAACATTAAGTTTAGGCCTGGCTCCGT6GTCGTGCAACTGACACTGGCT 

MCIR #9 

VIDVITCSSMLSSLCFLGAIAVDRYISIFY 
GTGATTGACGTCATCACATGCTCCAGCATGCTGTCCAGCCTCTGCTTTCTGGGAGCCATTGCCGTCGACAGATACATTAGCATTTTCTAT 

Tyros #21 

RNPGNHDKSRTPRLPSSADVEFCLSLTQYE 
AGGAATCCCGGAAACCATGACAAAAGCAGAACCCCTAGGCTCCCCTCCAGCGCTGACGTCGAGTTTTGCCTCAGCCTCACCCAATACGAA 

TRP-1 #14 

FDEWLRRYNADISTFPLENAPIGHNRQYNM 
TTCGATGAGTGGCTGAGAAGGTATAACGCTGACATTAGCACATTCCCTCTGGAAAACGCTCCCATTGGCCATAACAGACAGTATAACATG 

gplOO #39 

VSIiADTNSLAVVSTQLIMPGQEAGIaGQVPL 
GTGTCCCTGGCTGACACAAACTCCCTGGCTGTGGTCAGCACACAGCTCATmTGCCCGGAmGGAAGCCGGAC 

gplOO #20 

GPVTAQVVLQAAIPLTSCGSSPVPGTTDGH 
GGCCCTGTGACAGCCCAAGTGGTCCTGCAAGCCGCTATCCCTCTGACAAGCTGTGGCTCCAGCCCTGTGCCTGGCACAACCGATGGCCAT 

Tyros #8 

KFGFWGPNCTERRLIjVRRNIPDIjSAPEKDK 
AAGTTTGGCTTTTGGG6ACCCAATTGCACAGAGAGAAGGCTCCTGGTCAGGAGAAACATTTTCGATCTGTCCGCCCCTGAGAAAGACAAA 

gplOO #13 

lgthtmevtvyhrrgsrsyvplahsssaft 

CTGGGAACCCATACCATGGAGGTCa^CCGTCTACCaTAGGAGAGGCTCCAGGTCCTACGTCCCCCTCGCCCATAGCTCCAGCGCTTTCACA 
MCIR #12 

AVAAIWVASVVFSTLFIAYYDHVAVLLCLV 
GCCGTCGCCGCTATCT6G6TGGCTAGCGTCGTGTTTAGCACACTGTTTATCGCTTACTATGACCATGTGGCTGT6CTCCTGTGTCTGGTC 

TRP2 #25 

GTLDSQVMSIiHNLVHSFIiNGTNALPHSAAN 
GGCACACTGGATAGCCTUiGTGATGAGCCTCCACaATCTGGTCCACTCCTTCCTCS^CGGAACCAATGCCCTCCCCCATAGCGCTGCC^ 

MART #4 

GCWYCRRRNGYRALMDKSLHVGTQCALTRR 
GGCTGTTGGTATTGCAGAAGGAGAAACGGATACAGAGCCCTCATGGATAAGTCCCTGCATGTGGGAACCCAATGCGCTCTGACAAGGAGA 

Tyros #15 

PWHRIiFIiLRWEQElQKLTGDEMFTJPYWDW 
CCCTGGCACAGACTGTTTCTGCTCAGGTGGGAGCAAGAGATTCAGAAACTGACAGGCGATGAGAATTTCACAATCCCTTACTGGGACTGG 

MCIR #1 

AAMAVQGSQRRLLGSIiWSTPTAIPQLGLAA 
GCCGCTATGGCTGTGCAAGGCTCCCAGAGAAGGCTCCTGGGAAGCCTCAACTCCACCCCTACCGCTATCCCTCAGCTCGGCCTCGCCGCT 

MCIR #5 

VVATIAKNRNLHSPMYCFlCCIiALSDLLVS 
GTGGTCGCCACAATCGCTAAGAATAGGAATCTGCATAGCCCTATGTATTGCTTTATCTGTTGCCTCGCCCTCAGCGATCTGCTCGTGTCC 

Tyros #25 

QSSMHNAIiHIYMNGTMSQVQGSANDPlFLI, 
CAGTCCAGCATGCACAATGCCCTCCACATTTACATGAACGGAACCATGAGCCAAGTGCAAGGCTCCGCCAATGACCCTATCTTTCTGCTC 

Tyros #18 

GQHPTNPNLLSPASPFSSWQIVCSRLEEYN 
GGCCAACACCCTACCAATCCCAATCTGCTCAGCCCTGCCTCCTTCTTTAGCTCCTGGCAAATCGTCTGCTCCAGGCTCGAGGAATACAAT 

MCIR #6 

YCFICCIjALSDLLVSGTNVLETAVIIiLIjEA 
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TACTGTTTCATTTGCTGTCTGGCTCTGTCCGACCTCCTGGTCAGCGGAACCAATGTGCTCGAGACAGCCGTCATCCTCCTGCTCGAGGCT 
TRP2 #19 

DPTLISRNSRFSSWETVCDSLDDYNHIjVTIi 
GACCCTACCCTCATCTCCAGGAATAGCAGATTCTCCAGCTGGGAGACAGTGTGTGACTCCCTGGATGACTATAACCATCTGGTCACCCTC 

MUCIP #8 

TRPALGSTTPPAHDVTSAPDNKAA 
ACCAGACCCGCTCTGGGAAGCACAACCCCTCCCGCTCACGATGTGACAAGCGCTCCCGATAACAAAGCCGCT 

Tyros #17 

RDAEKCDICTDEYMGGQHPTNPNLLSPASF 
AGGGATGCCGAAAAGTGTGACATTTGCACAGACGAATACATGGGCGGACAGCATCCCACAAACCCTAACCTCCTGTCCCCCGCTAGCTTT 

gplOO #17 

TFAIiQLHDPSGYLAEADLSYTWDFGDSSGT 
ACCTTTGCCCTCCAGCTCCACGATCCCTCCGGCTATCTGGCTGAGGCTGACCTCAGCTATACCTGGGACTTTGGCGATAGCTCCGGCACA 

Tyros #22 

SSADVEFCIiSLTQYESGSMDKAANFSFRNT 
AGCTCCGCCGATGTG6AATTCTGTCTGTCCCTGACACAGTATGAGTCCGGCTCCATGGATAAGGCTGCCAATTTCTCCTTCAGAAACACA 

gplOO #6 

GPTLIGANASFSIALNFPGSQKVLPDGQVI 
GGCCCTACCCTCATC6GAGCCAATGCCTCCTTCTCCATCGCTCTGAATTTCCCTGGCTCCCAGAAAGTGCTCCCCGATGGCCAAGTGATT 

MCIR #18 

WGPFFLHLTLIVLCPEHPTCGCIFKNFNLF 
TGGGGACCCTTTTTCCTCCACCTCACCGTCATCGTCCTGTGTCCCGAACACCCTACCTGTGGCTGTATCTTTAAGAATTTCAATCTGT 

Tyros #7 

CQCSGNFMGFNCGNCKFGFW6PNCTERRI,L 
TGCCAATGCTCCGGCAATTTCATGGGCTTTAACTGTGGCAATTGCAAATTCGGATTCTGGGGCCCTAACTGTACCGAAAGGAGACTGCTC 

TRP2 #34 

QYRRLRKGYTPLMETHLSSKRYTEEAAA 
CAGTATAGGAGACTGAGAAAGGGATACAmcCCCTCATGGAAACCCATCTGTCCAGCAAAAGGTATACCGAAGAGG 

TRP-1 #15 

PLENAPIGHNRQYNMVPFWPPVTNTEMFVT 
CCCCTCGAGAATGCCCCTATCGGACACAATAGGCAATACAATATGGTCCCCTTTTGGCCTCCCGTCACCAATACCGAAATGTTTGTGACA 

gplOO #7 

NFPGSQKVLPDGQVXWVNNTIINGSQVWGG 
AACTTTCCCGGAAGCCAAAAGGTCCTGCCTGACGGACAGGTCATCTGGGTGAATAACACAATCATTAACGGAAGCCAAGTGTGGGGCGGA 

gfplOO #22 

RPTAEAPNTTAGQVPTTEVVGTTPGQAPTA 
AGGCCTACCGCTGAGGCTCCCAATACCACAGCCGGACAGGTCCCCACAACCGAAGTGGTCGGCACAACCCCTGGCCAAGCCCCTACCGCT 

MUCIF #3 

STPGGEKETSATQRSSVPSSTEKNAVSMTS 
AGCACACCCGGAGGCGAAAAGGAAACCTCCGCCa^CACAGAQAAGCTCCGTGCCTAGCTCCACCGAAAAGAATGCCGTCAGCA 

gplOO #42 

LIYRRRLMKQDFSVPQLPHSSSHWLRLPRI 
CTGATTTACAGAAG6AGACTGATGAAGCAAGACTTTAGCGTCCCCCAACTGCCTCACTCCAGCTCCCACTGGCTGAGACTGCCTAGGATT 

TRP2 #12 

LGLLGPNGTQPQFANCSVYDFFVWIjHYYSV 
CTGGGACTGCTCGGCCCTAACGGAACCCAACCCCAATTCGCTAACTGTAGCGTCTACGATTTCTTTGTGTGGCTGCATTACTATAGCGTC 

TRP-1 #9 

CLEVGIiFDTPPFYSNSTNSFRNTVEGYSDP 
TGCCTCGAGGTCGGCCTCTTCGATACCCCTCCCTTTTACTCCSUVCTCCACCAATAGCTTTAGGAATACCGTCGAGGGATACTCCGACCCT 

gplOO #1 

AAMDLVLKRCLLHIiAVIGALLAVGATKVPR 
GCCGCTATGGATCTGGTCCTGAAAAGGTGTCTGCTCCACCTCGCCGTCATCGGAGCCCTCCTGGCTGT6GGAGCCACAAAGGTCCCCAGA 

MCIR #3 

NQTGARCLEVSISDGLFLSLGIjVSLVENAI. 
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AACCAAACCGGAGCCAGATGCCTCGAGGTCAGCATTAGCGATGGCCTCTTCCTraGCCTCGGCCTCGTGTCCCTGGTCGAGAATGCCCTC 

Tyros #23 

SGSMDKAANFSFRNTLEGFASPLTGIADAS 
AGCGGAAGCATGGAGAAAGCCGCTAACTTTAGCTTTAGGAATACCCTCGAGGGATTCGCTAGCCCTCTGACAGGCATTGCCGATGCCTCC 

Tyros #4 

SPCGQLSGRGSCQNILLSNAPLGPQFPFTG 
AGCCCTTGCGGACAGCTCAGCGGAAGGGGAAGCTGTCAGAATATCCTCCTGTCCAACGCTCCCCTCGGCCCTCAGTTTCCCTTTACCGGA 

Tyros #13 

MHYYVSMDADLGGSEIWRDIDFAHEAPAFL 
ATGCATTACTATGTGTCCATGGATGCCCTCCTGGGAGGCTCCGAGATTTGGAGAGACATTGACTTTGCCCATGAGGCTCCCGCTTTCCTC 

Tyros #35 

EEKQPLLMEKEDYHSLYQSHIiAA 
GAGGAAAAGCAACCCCTCCTGATG6AGAAA6AGGATTACCATAGCCTCTACCAAAGCCATCTGGCT6CC 

TRP2 #5 

GQCTEVRADTRPWSGPYILRNQDDRELWPR 
GGCCaATGCACAGAGGTCAGGGCTGACACAAGGCCTTGGTCCGGCCCTTACy^TTCTGAGAAACCaAGACGATAGGGAACTGTGGCCm^^ 

MUCIF #4 

SVPSSTEKNAVSMTSSVLSSHSPGSGSSTT 
AGCGTCCCCTCCAGCACAGAGAAAAACGCTGTGTCCATGACAAGCTCCGTGCTCAGCTCCCACTCCCCCGGAAGCGGAAGCTCCACCACA 

Tyros #12 

TPMFNDINlYDLFVWMHYYVSMDALIiGGSE 
ACCCCTATGTTTAACGATATCAATATCTATGACCTCTTCGTCTGGATGCACTATTACGTCAGCATGGACGCTCTGCTCGGCGGAAGCGAA 

gplOO #9 

QPVYPQETDDACIFPDGGPCPSGSWSQKRS 
CAGCCTGTGTATCCCCaAGAGACAGACGATGCCTGTATCTTTCCCGATGGCGGACCCTGTCCCTCCGGCTCCTGGTCCCAGAAAAGGTCC 

TRP-1 #5 

DSLEDYDTLGTIiCNSTEDGPIRRNPAGNVA 
GACTCCCTGGAAGACTATGACAGACTGGGAACCCTCTGCAATAGCACAGAGGATGGCCCTATCAGAAGGAATCCCGCTGGCAATGTGGCT 

gplOO #8 

WVNNTI INGSQVWGGQPVYPQETDDACIFP 
TGGGTCAACAATACCATTATCAATG6CTCCCAGGTCTGG6GAGGCCAACCCGTCTACCCTCAGGAAACCGATGACGCTTGCATTTTCCCT 

MART #7 

QEKNCEPVVPNAPPAYEKLSAEQS PPPYS P 
CAGGAAAAGAATTGCGAACCCGTCGTGCCTAACGCTCCCCCTGCCTATGAGAAACTGTCCGCCGAACAGTCCCCCCCTCCCTATAGCCCT 

gplOO #14 

SRSYVPLAHSSSAFTITDQVPFSVSVSQLR 
AGCAGAAGCTATGTGCCTCTGGCTCACTCCAGCTCCGCCTTTACCATTACCGATCAGGTCCCCTTTAGCGTCAGCGTCAGCCAACTGAGA 

TRP-1 #2 

LEKDMQEMLQEPSFSLPYWNPATGKNVCDI 
CTGGAAAAGGATATGCAAGAGATGCTGCAAGAGCCTAGCTTTAGCCTCCCCTATTGGAATTTCGCTACCGGAAAGAATGTGTGTGACATT 

TRP-1 #16 

VPPWPPVTNTEMFVTAPDNLGYTYEAA 
GTGCCTTTCTGGCCCCCTGTGACAAACACAGAGATGTTCGTCACCGCTCCCGATAACCTCGGCTATACCTATGAGGCTGCC 

TRP2 #13 

CSVYDFFVWIiHYYSVRDTLLGPGRPYRAID 
T6CTCCGT6TATGACTTTTTCGTCTGGCTCCACTATTACTCCGTGAGAGACACACTGCTCGGCCCTGGCAGACCCTATAGGGCTATCGAT 

Tyros #9 

VRRNIFDLSAPEKDKFFAYLTIiAKHTlSSD 
GTGAGAAGGAATATCTTTGACCTCAGCGCTCCCGAAAAGGATAAGTTTTTCGCTTACCTCACCCTCGCCAAACACACAATCTCCAGCGAT 

MART #2 

KKGHGHSYTTAEEAAGIGILTVIIiGVLLIiI 
AAGAAAGGCCATGGCCATAGCTATACCACAGCC6AAGAGGCTGCCGGAATCGGAATCCTCACCGTCATCCTCGGCGTCCTGCTCCTGATT 

gplOO #11 

FVYVWK^TWGQYWQVIiGGPVSGIiSIGTGRAM 
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TTCGTCTACGTCTGGAAAACCTGGGGCCAATACTGGCAGGTCCTGGGAGGCCCTGTGTCCGGCCTCAGCATTGGCACAGGCAGAGCCATG 
gplOO #12 

GGPVSGIiSIGTGRAMLGTHTMEVTVYHRRG 
GGCGGACCCGTCAGCGGACTGTCCATCGGAACCGGAAGGGCTATGCTCGGCACACy^CACAATGGAAGTGACAGTGTATCACAGAAGG 

gplOO #25 

ISTAPVQMPTAESTGMTPEKVPVSEVMGTT 
ATCTCCACCGCTCCCGTCCAGATGCCCACAGCCGAAAGCACAGGCATGACCCCTGAGAAAGTGCCTGTGTCCGAGGTCATGGGAACCACA 

Tyros #19 

PSSWQIVCSRLEEYNSHQSLCNGTPEGPLR 
TTCTCCAGCTGGCAGATTGTGTGTAGCAGACTGGAAGAGTATAACTCCCACCAAAGCCTCTGCAATGGCACACCCGAAGGCCCTCTGAGA 

TRP2 #27 

DPIPVVLHSPTDAIFDEWMKRFNPPADAWP 
GACCCTATCTTTGTGGTCCTGCATAGCTTTACCGATGCCATTTTCGATGAGTGGATGAAAAGGTTTAACCCTCCCGCTGACGCTTGGCCT 

MCIR #15 

HMLARACQHAQGIARLHKRQRPVHQGFGIjK 
CACATGCTGGCTAGGGCTTGCCAAO^CGCTCa^GGGAATCGCTAGGCTCCACAAAAGGCMAGGCCTGTGCA^ 

MUCIF #2 

IiIiTVIiTVVTGSGHASSTPGGEKETSATQRS 
CTGCTCACCGTCCTGACAGTGGTCACCGGAAGCGGACACGCTAGCTCCACCCCTGGCGGAGAGAAAGAGACaiAGCGCTACCa^ 

gplOO #44 

FCSCPIGENSPLLSGQQVAA 
TTCTGTAGCTGTCCCATTGGCGAAAACTCCCCCCTCCTGTCCGGCCAACAGGTCGCCGCT 

TRP2 #24 

TFSFRNALEGFDKADGTLDSQVMSLHNIiVH 
ACCTTTAGCTTTAGGAATGCCCTCGAGGGATTCGATAAGGCTGACGGAACCCTCGACTCCCAGGTCATGTCCCTGCATAACCTCGTGCAT 

Tyros #20 

SHQSLCNGTPEGPLRRNPGNHDKSRTPRLP 
AGCCATCAGTCCCTGTGTAACGGAACCCCTGAGGGACCCCTCAGGAGAAACCCTGGCAATCACGATAAGTCCAGGACACCCA6ACTGCCT 

TRP2 #3 0 

PFFPPVTNEELFLTSDQLGYSYAIDLPVSV 
CCCTTTTTCCCTCCCGTCACCAATGAGGAACTGTTTCTGACAAGCGATCAGCTCGGCTATAGCTATGCCATTGACCTCCCCGTCAGCGTC 

TRP2 #9 

ERKKPPVIRQNIHSLSPQEREQFLGALDLA 
GAGAGAAAGAAACCCCCTGTGATTAGGCAAAACATTCACTCCCTGTCCCCCCAAGAGAGAGAGCAATTCCTCGGCGCTCTGGATCTGGCT 

TRP2 #29 

QELAPIGHNRMYNMVPFFPPVTNEELFLTS 
CAGGAACTGGCTCCCATTGGCCATAACAGAATGTATAACATGGTGCCTTTCTTTCCCCCTGTGACAAACGAAGAGCTCTTCCTCACCTCC 

gplOO #28 

EVSIVVLSGTTAAQVTTTEWVETTAREIiPI 
GAGGTCAGCATTGTGGTCCTGTCCGGCACAACCGCTGCCCaAGTGACAACCACyiGAGTGGGTGGAAACCACAGCCAGAGAGCT 

MUCIR #7 

TSPQLSTGVSFPFIiSFHISNIiQFNSSLEDP 
ACCTCCCCCCAACTGTCCACCGGAGTGTCCTTCTTTTTCCTCAGCTTTCACATTAGCAATCTGCAATTCAATAGCTCCCTGGAAGACCCT 

MUCIR #19 

YHTHGRYVPPSSTDRSPYEKVSAGNGGSSL 
TACCATACCCATGGCAGATACGTCCCCCCTAGCTCCACCGATAGGTCCCCCTATGAGAAAGTGTCCGCCGGAAACGGAGGCTCCAGCCTC 

MCIR #4 

LFLSLGLVSIiVBNALVVATlAKNRNLHSPM 
CTGTTTCTGTCCCTGGGACTGGTCAGCCTCGTGGAAAACGCTCTGGTCGTGGCTACCATTGCCAAAAACAGAAACCTCCACTCCCCCATG 

TRP2 #26 

SFIiNGTNAIjPHSAANDPIFVVIiHSFTDAIF 
AGCTTTCTGAATGGCACAAACGCTCTGCCTCACTCCGCCGCTAACGATCCCATTTTCGTC6TGCTCCACTCCTTCACAGACGCTATCTTT 

MUCIR #17 

AVCQ'CRRKNYGQLDIFPARDTYHPMSEYPT 
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GCCGTCTGCCAATGCAGAAGGAAAAACTATGGCCAACTGGATATCTTTCCCGCTAGGGATACCTATCACGCTATGTCCGAGTATCCCA 

MCIR #14 

VFFLAMLVLMAVIiYVHMIiARACQHAQGIAR 
GTGTTTTTCCTCGCCATGCTGGTCCTGATGGCCGTCCTGTATGTGCATATGCTCGCCAGAGCCTGTCAGCATGCCCAAGGCATTGCCAGA 

TRP-1 #10 

STNSFRNTVEGYSDPTGKYDPAVRSLHNLA 
AGCACAAACTCCTTCAGAAACACAGTGGAAGGCTATAGCGATCCCACAGGCAAATACGATCCCGCTGTGAGAAGCCTCCACAATCTGGCT 

TRP-1 #3 

lpywn'fatgknvcdictddlmgsrsnfdst 
ctgccttactggaactttgccacaggcaaaaacgtctgcgatatctgtaccgatgacctcatgggaagcagaagcaatttcgatagcaca 

gplOO #15 

ITDQVPFSVSVSQLRALDGGNKHPLRNQPL 
ATCACAGACCAAGTGGCTTTCTCCGTGTCCGTGTCCCAGCTCAGGGCTCTGGATGGCGGAAACT^aACACTTTCTGAGA^ 

MUCIR #8- 

FHISNLQFNSSIiEDPSTDYYQEIjQRDISEM 
TTCCATATCTCCAACCTCCAGTTTAACTCCAGCCTCGAGGATCCCTCCACCGATTACTATCAGGAACTGCAAAGGGATATCTCCGAGATG 

MUCIR #20 

SPYEKVSAGNGGSSLSYTNPAVAAASANLA 
AGCCCTTACGAAAAGGTCAGCGCTGGCAATGGCGGAAGCTCCCTGTCCTACACAAACCCTGCCGTCGCCGCTGCCTCCGCCAATCTGGCT 

Tyros #11 

YVIPIGTYGQMKNGSTPMFNDINIYDLFVW 
TACGTCaiTCCCTATCGGAACCTATGGCCAAATGAAAAACGGAAGCACACCCy^TGTTCAATGACATT^ 

gplOO #37 

RLCQPVIiPSPACQIiVLHQIIiKGGSGTYCLN 
AGGCTCTGCCAACCCGTCCTGCCTAGCCCTGCCTGTCAGCTCGTGCTCCACCAAATCCTCAAGQGAGGCTCC6GCACATACTGTCTGAAT 

gplOO #33 

RYGSF SVTLD IVQGIESAEIIiQAVPSGEGD 
AGGTATGGCTCCTTCTCCGTGACACTGGATATCGTCCaGGGAATCGAAAGCGCTGAGATTCTGCAAGCCGTCCCCTCCGGCGAAGGCG^ 

Tyros #27 

HHAFVD SI FEQWIiQRHRPIiQEVYPEANAPI 
CACCATGCCTTTGTGGATAGCATTTTCGAACSiGTGGCrGCAAAGGCaTAGGCCTCTGCAAGAGGTCTACCCTG 

TRP-1 #4 

CTDDLMGSRSNFDSTLISPNSVFSQWRVVC 
TGCACAGACGATCTGATGGGCTCCAGGTCCAACTTTGACTCCACCCTCATCTCCCCCAATAGCGTCTTCTCCCAGTGGAGGGTCGTGTGT 

MUCIR #18 

FPARDTYHPMSEYPTYHTHGRYVPPSSTDR 
TTCCCTGCCAGAGACACATACCATCCCATGAGCGAATACCCTACCTATCACACACACGGAAGGTATGTGCCTCCCTCCA6CACAGACAGA 

MUCIR #21 

SYTNPAVAAASANLAA 
AGCTATACCAATCCCGCTGTGGCTGCCGCTAGCGCTAACCTCGCCGCT 

MCIR #19 

EHPTCGCIFKNFNLFLALXICNAIIDPLIY 
GAGCATCCCACATGCGGATGCATTTTCAAAAACTTTAACCTCTTCCTCGCCCTCATCATTTGCAATGCCATTATCGATCCCCTCATCTAT 

Tyros #26 

MSQVQGSANDPIFLLHHAFVDSIPEQWLQR 
ATGTCCCAGGTCCAGGGAAGCGCTAACGATCCCATTTTCCTCCTGCATCACGCTTTC6TCGACTCCATCTTTGAGCAATGGCTCCAGAGA 

TRP2 #22 

RNSMKLPTIjKDIRDCLSIiQKFDNPPFFQNS 
AGGAATAGCATGAAGCTCCCCACACTGAAAGACATTAGGGATTGCCTCAGCCTCCAGAAATTCGATAACCCTCCCTTTTTCCAAAACTCC 

gplOO #19 

L ISRAIiVVTHTYLEPGPVTAQVVriQAAIPI, 
CTGATTA6CAGAGCCCTCGTGGTCACCCATACCTATCTGGAACCCGGACCCGTCACCGCTCAGGTCGTGCTCCAGGCTGCCATTCCCCTC 

TRP2 #17 

SFALPYWNFATGRNECDVCTD QLFGAARPD 
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AGCTTTGCCCTCCCCTATTGGAATTTCGCTACCGGAAGGAATGAGTGTGACGTCTGCACAGACCAACTGTTTGGCGCTGCCAGACCCGAT 

gplOO #2 

VIGAIiliAVGATKVPRNQDWIjGVSRQLRTKA 
GTGATTGGCGCTCTGCTCGCCGTCGGCGCTACCAAAGTGCCTAGGAATCAGGATTGGCTCGGCGTCAGCAGACAGCTCAGGACAAAGGCT 

gplOO #16 

ALDGGNKHPLRNQPLTFALQIiHDPSGYLAE 
GCCCTCGACGGAGGCAATAAGCATTTCCTCAGGAATCAGCCTCTGACATTCGCTCTGCAACTGCATGACCCTAGCGGATACCTCGCCGAA 

TRP2 #18 

CDVCTDQIiPGAARPDDPTLI SRNSRFSSWE 
TGCGATGTGTGTACCGATCAGCTCTTCGGAGCCGCTAGGCCTGACGATCCCACACTGATTAGCAGAAACTCCAGGTTTAGCTCCTGGGAA 

MART #1 

AAMPREDAHPIYGYPKKGHGHSYTTAEEAA 
GCCGCTATGCCTAGGGAAGACGCTCACOT^ATCTATGGCTATCCCTiAAAAGGGAa^CGGACACTCCTACAC^ 

TRP-1 #11 

TGKYDPAVRSIiHNLAHLFIjNGTGGQTHLSS 
ACCGGAAAGTATGACCCTGCC6TCAGGTCCCTGCATAACCTCGCCCATCTGTTTCTGAATGGCACAGGCGGACAGACACACCTCAGCTCC 

MUCIR #14 

SDVSVSDVPFPFSAQSGAGVPGWGIALLVL 
AGCGATGTGTCCGTGTCCGACGTCCCCTTTCCCTTTAGCGCTCAGTCCGGCGCTGGCGTCCCCGGATGGGGAATCGCTCTGCTCGTGCTC 

TRP2 #10 

SPQEREQFIiGALDIjAKKRVHPDYVI ttqhw 
AGCCCTCSVGGAAAGGGAACAGTTTCTGGGAGCGCTCGACCTCGCCaAAAAGAGAGTGCATCCCGATTACGTCATCAC^ 

Tyros #10 

PPAY LTLAKHTXSSDYVIPIGTYGQMKNGS 
TTCTTTGCCTATCTGACACTGGCTAAGCATACCATTAGCTCCGACTATGTGATTCCCATTGGCACATACGGACAGATGAAGAATGGCTCC 

MCIR #7 

GTNVLETAVILIiLEAGAIiVARAAVLQQLDN 
GGCACAAACGTCCTGGAAACCGCTGTGATTCTGCTCCTGGAAGCCGGAGCCCTCGTGGCTAGGGCTGCCGTCCTGCAACAGCTCGACAAT 

MUCIR #16 

VCVLVALAIVYIilALAVCQCRRKNYGQIiDI 
GTGTGTGTGCTCGTGGCTCTGGCTATCGTCTACCTCATCGCTCTGGCTGTGTGTCAGTGTAGGAGAAAGAATTACGGACAGCTCGACATT 

MART #6 

CPQEGFDHRDSKVSIiQEKNCEPVVPNAPPA 
TGCCCTCAGGAAGGCTTTGACCATAGGGATAGCAAAGTGTCCCTGCAAGAGAAAAACTGTGAGCCTGTGGTCCCCAATGCCCCTCCCGCT 

MUCIF #5 

SVIiSSHSPGSGSSTTQGQDVTLAPATEPAS 
AGCGTCCTGTCCAGCCATAGCCCTGGCTCCGGCTCCAGCACAACCCAAGGCCAAGACGTCACCCTCGCCCCTGCCACAGAGCCTGCCTCC 

TRP2 #28 

DEWMKRPNPPA DAWPQEIiAPIGHNRMYNJWV 
GACGAATGGATGAAGAGATTCyVATCCCCCTGCCGATGCCTGGCCCCAAGAGCTCGCCCCTATCGGACACAATAGGATGTAO^ 

MCIR #21 

APHSQE IiRRTLKEVIiTCSWAA 
GCCTTTCACTCCCAGGAACTGAGAAGGACACTGAAAGAGGTCCTGACATGCTCCTGGGCTGCC 

TRP2 #15 

FSHQGPAPVTWHRYHIiIiCLERDLQRLIGNE 
TTCTCCCACCAAGGCCCTGCCTTTGTGACATGGCATAGGTATCACCTCCTGTGTCTGGAAAGGGATCTGCAAAGGCTCATCGGAAACGAA 

TRP-1 #8 

RPMVQRLPEPQDVAQCLEVGLFDTP PFYSN 
AGGCCTATGGTCCAGAGACTGCCTGAGCCTCAGGATGTGGCTCAGTGTCTGGAAGTGGGACTGTTTGACACACCCCCTTTCTATAGCAAT 

TRP-1 #13 

QDPIFVLLHTPTDAVFDEWIiRRYNADISTF 
CAGGATCCCATTTTCGTCCTGCTCCACACATTCACAGACGCTGTGTTTGACGAATGGCTCAGGAGATACAATGCCGATATCTCCACCTTT 

TRP2 #4 

LGAESANVCGSQQGRGQCTEVRADTRPWSG 
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CTGGGAGCCGAaAGCGCTAACGTCTGCGGAAGCCAACAGGGAAGGGGACAGTGTACCGAAGTGAGAGCCGATACCAGACCCTGGAGCGGA 

TRP2 #8 

YNCGD CKFGWTGPNCERKKPPVIRQNlHSIi 
TACAATTGCGGAGACTGTAAGTTTGGCTGGACCGGACCCAATTGCGAAAGGAAAAAGCCTCCCGTCATCAGACAGAATATCCATAGGCTC 

TRP-1 #12 

HLFXjNGTGGQTHLSSQDPI FVLLHTFTDAV 
CACCTCTTCCTCAACGGAACCGGAGGCCAAACCCATCTGTCCAGCCAAGACCCTATCTTTGTGCTCCTGCATACCTTTACCGATGCCGTC 

Tyros #34 

GIiVSLLCRHKRKQLPEEKQPIiLMEKEDYHS 
GGCCTCGTGTCCCTGCTCTGCAGACACAAAAGGAAACAGCTCCCCGAAGAGAAACAGCCTCTGCTCATGGAAAAGGAAGACTATCACTCC 

TRP2 #2 

GCKILPGAQGQFPRVCMTVDSLVNKECCPR 
GGCTGTAAGATTCTGCCTGGCGCTCAGGGACAGTTTCCO^iGAGTGTGTATGACAGTGGATAGCCTCGTGAATAAGGAATGCTGTCCCAGA 

gplOO #43 

QIiPHSSSHWIiRLPRIFCSCPIGEKTSPLIiSG 
CAGCTCCCCCATAGCTCCAGCCATTGGCTCAGGCTCCCCAGAATCTTTTGCTCCTGCCCTATCGGAGAGAATAGGCCTCTGCTCAGCGGA 

gplOO #10 

DGGPCPSGSWSQKRSFVYVWKTWGQYWQVL 



gplOO #3 

N,QDWL GVSRQIiRTKAWNRQLYPEWTEAQRI, 
AACCAAGACTGGCTGGGAGTGTCCAGGCAACTGAGAACCAAAGCCTGGAACAGAC^GCTCTACCCTGAGTGGACCGAAGCCCAAAG^ 

Tyros #14 

IWRDIDFAHEAPAFIiPWHRIiFLLRWEQEIQ 
ATCTGGAGGGATATCGATTTC6CTCACGAAGCCCCTGCCTTTCTGCCTTGGCATAGGCTCTTCCTCCTGAGATGGGAACAGGAAATCCAA 

MUCIF #1 

AAMTPGTQSPFPLLIiliLTVLTVVTGSGHAS 
GCCGCTATGACACCCGGAACCCAAAGCCCTTTCTTTCTGCTCCTGCTCCTGACAGTGCTCACCGTCGTGACAG6CTCCGGCCATGCCTCC 

MART #5 

DKSIiHVGTQCALTRRCPQEGFDHRDSKVSL 
GACAAAAGCCTCCACGTCGGCACACaW3T6TGCCCTC»iCCAGAAGGTGTCCCCAAGAGGGATTCGATCAa^^ 

MUCIR #2 

NVTSASGSASGSASTIiVHNGTSARATTTPA 
AACGTCACCTCCGCCTCCGGCTCCGCCTCCGGCTCCGCCTCCACCCTCGTGCATAACGGAACCTCCGCCAGAGCCACAACCACACCCGCT 

Tyros #24 

LEGFAS PliTGIADASQSSMHNAIaHIYMN-GT 
CTGGAAGGCTTTGCCTCCCCCCTCACCGGAATCGCTGACGCTAGCCAAAGCTCCATGCATAACGCTCTGCATATCTATATGAATGGCACA 

TRP2 #14 

RDTIiIiGPGRPYRAIDPSHQGPAFVTWHRYH 
AGGGATACGCTCCTGGGACCCGGAAGGCCTTACAGAGCCATTGACTTTAGCCATCAGGGACCCGCTTTCGTCACCTGGCACAGATACCAT 

Tyros #1 

AAMLLAVLYCLLWSFQTSAGHFPRACVSSK 
GCCGCTATGCTCCTGGCTGTGCTCTACTGTCTGCTCTGGTCCTTCCAAACCTCCGCCGGACACTTTCCCa^GAGCCTGTGTGTCa^GC^^ 

gplOO #35 

AFEXiTVSCQGGIiPKEACMEISSPGCQPPAQ 
GCCTTTGAGCTCACCGTCAGCT6TCAGGGAG6CCTCCCCAAAGAGGCTTGCATGGAGATTAGCTCCCCCGGATGCCAACCCCCTGCCCAA 

Tyros #6 

VDDRESWPSVFYNRTCQCSGNFMGFNCGNC 
GTGGATGACAGAGAGTCCTGGCCTAGCGTCTTCTATAACAGAACCTGTCAGTGTAGCGGAAACTTTATGGGATTCAATTGCGGAAACTGT 

gplOO #34 

ESAEILQAVPSGEGDAPEIiTVSCQGGXiPKE 
GAGTCCGCCG2\AATCCTCCAGGCTGTGCCTAGCGGAGAGGGAGACGCTTTCGAACTGACAGTGTCCTGCCAAGGCGGACTGCCTAAGGAA 

TRP2 #20 

TVCDSLDDYNHIiVTLClsrGTYBGIiLRRNQMG 
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ACCGTCTGCGATAGCCTCGACGATTACAATCACCTCGTGACACT6TGTAACGGAACCTATGAGGGACTGCTCAG6AGAAACCAAATGGGA 
Tyros #5 

LLSNAPLGPQFPFTGVDDRESWPSVFYNRT 
CTGCTCAGCAATGCCCCTCTGGGACCCCAATTCCCTTTCACAGGCGTCGACGATAGGGAAAGCTGGCCCTCCGTGTTTT^CAATAGGACA 

MART #8 

YEKLSAEQSPPPYSPAA 
TACGAAAAGCTCAGCGCTGAGCAAAGCCCTCCCCCTTACTCCCCCGCTGCC 

gplOO #41 

IV GILLVLMAVVLASLIYRRRLMKQDF. SVP 
ATCGTCGGCATTCTGCTCGTGCTCATGGCTGTGGTCCTGGCTAGCCTCATCTATAGGAGAAGGCTCATGAAACAGGATTTCTCCGTGCCT 

MART #3 

GIGILTVlLGVLLLIGCWYCRRRNGYRAIiM 
6GCATTGGCATTCTGACAGTGATTCTGGGAGTGCTCCTGCTCATCGGATGCTGGTACTGTAGGAGAAGGAATGGCTATAGGGCTCTGATG 

Tyros #31 

YSYLQDSDPDSPQDYIKSYLEQASRIWSWL 
TACTCCTACCTCCAGGATAGCGATCCCGATAGCTTTCAGGATTACATTAAGTCCTACCTCGAGCAAGCCTCCAGGATTTGGTCCTGGCTC 

MUCIF #6 

QGQDVTBAPATEP ASGSAATWGQDVTSVPV 
CAGGGACAGGATGTGACACTGGCTCCCGCTACCGAACCCGCTAGCGGAAGCGCTGCCACATGGGGACAGGATGTGACAAGCGTCCCCGTC 

gplOO #21 

TSCGSSPVPGTTDGHRPTAEAP NTTAGQVP 
ACCTCCTGCGGAAGCTCCCCCGTCCCCGGAACCACAGACGGAmCAGACCCACAGCCGAAGCCCCTAACACAACCGCTGGCCM 

MUCIR #3 

IiVHNGTSARATTTPASKSTPFSIPSHHSDT 
CTGGTCCACAATGGCACa^GCGCTAGGGCTACCACAACCCCTGCCTCCAAGTCCACCCCTTTCTCCATCCCTAGCaiTCACT 

TRP2 #32 

BETPGWPTTIiLVviyiGTriVAIiVGIiFVLIiAFi:, 
GAGGAAACCCCTGGCTGGCCCACAACCCTCCTGGTCGTGATG6GCACACTGGTCGCCCTCGTGGGACTGTTTGTGCTCCTGGCTTTCCTC 

gplOO #29 

TTTEWVETTARELPIPE PEGPDASSIMSTE 
ACCACAACCGAATGGGTCGAGACAACCGCTAGGGAACTGCCTATCCCTGAGCCTGAGGGACCCGATGCCTCCAGCATTATGTCCACCGAA 

MCIR #17 

GAVTLTI LLGIFFLCWGP FFLHIiTLIVLC P 
GGCGCTGTGACACTGACAATCCTCCTGGGAATCTTTTTCCTCTGCTGGGGCCCTTTCTTTCTGCATCTGACACTGATTGTGCTCTGCCCT 

Tyros #33 

LGAAMVGAVLTALLAGLVSLLCRHKRKQL P 
CTGGGAGCCGCTATGGTCGGCGCTGTGCTCACCGCTCTGCTCGCCGGACTGGTCAGCCTCCTGTGTAGGCATAAGAGAAAGCAACTGCCT 

MCIR #8 

GAIiVARAAVIiQQLDNVIDVITCSSMLSSIjC 
GGCGCTCTGGTCGCCAGAGCCGCTGTGCTCCAGCAACTGGATAACGTCATCGATGTGATTACCTGTAGCTCCATGCTCAGCTCCCTGTGT 

gplOO #26 

MTPEKVPVSEVMGTTIiAEMSTPEATGMTPA 
ATGACACCCGAAAAGGTCCCCGTCAGCGAAGTGATGGGC3^CAACCerCGCCGA2^TGTCCACCCCTGAGGCTACCGGAATGACAC 

Tyros #2 

QTSAGHFPRACVSSKNLMEKECCPPWSGDR 
CAGACAAGCGCTGGCCATTTCCCTAGGGCTTGCGTCaGCTCCAAGAATCTGATGGAGAAAGAGTGTTGCCCTCCCTGGAGCGGAGACAGA 

MCIR #11 

ALRYHS IVTIiPRAPRAVAAIWVA-SVVPSTL 
GCCCTCAGGTATCACTCCATCGTCACCCTCCCCAGAGCCCCTAGGGCTGTGGCTGCCATTTGGGTCGCCTCCGTGGTCTTCTCCACCCTC 

MUCIR #12 

FREGTlNVHDVETQFNQYKTEAASRYNIiTI 
TTCAGAGAGGGAACCATTAACGTCCACGATGTGGAAACCCAATTCAATCAGTATAAGACAGAGGCTGCCTCCAGGTATAACCTCACCATT 

Tyros #3 

NLMEKECCPPWSGDRSPCGQLSGRGSCQNI 
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AACCTCATGGAAAAGGAATGCTGTCCCCCTTGGTCCGGCGATAGGTCCCCCTGTGGCCAACTGTCCGGCAGAGGCTCCTGCCAAAACATT 
Tyros #32 

IKSYLEQASRIWSWLLGAAMVGAVLTALLA 
ATCAAAAGCTATCTG6AACAGGCTAGCAGAATCTGGAGCTGGCTGCTCGGCGCTGCCATGGTGGGAGCCGTCCTGACAGCCCTCGTGGCT 

MUCIR #5 

PTTLASHSTKTDASSTHHSSVPPLTSSNHS 
CCCACAACCCTCGCCTCCCaCTCCACCTiAAACCGATGCCTCCAGCACACACCATAGCTCCGTSCCT 

MUCIR #15 

SGAGVPCWGIALIiVliVCVLVArjAIVYIilAL 
AGCGGAGCCGGAGTGCCTGGCTGGGGCATTGCCCTCCT6GTCCTGGTCTGCGTCCTGGTCGCCCTCGCCATTGTGTATCTGATTGCCCTC 

MCIR #10 

PIiGAIAVDRYISIFYALRYHSIVTIjPRAPR 
TTCCTCGGCGCTATCGCTGTGGATAGGTATATCTCCATCTTTTACGCTCTGAGATACCATAGCATTGTGACACT6CCTAGGGCTCCCAGA 

gplOO #40 

LIMPGQEAGLGQVPLIVGILIiVIiMAVVIiAS 
CTGATTATGCCTGGCCAAGAG6CTGGCCTCGGCCAAGTGCCTCTGATTGTGGGAATCCTCCTGGTCCTGATGGCCGTCGTGCTCGCCTCC 

TRP2 #33 

TXjVALVGIjFVIiLAFLQYRRIjRKGYTPIjMET 
ACCCTCGTGGCTCTGGTCGGCCTCTTCGTCCTGCTCGCCTTTCTGCAATACAGAAGGCTCAGGAAAGGCTATACCCCTCTGATGGAGACA 

TRP-1 #5 

LISPNSVPSQWEVVCDSIjEDYDTIiGTIiCNS 
CTGATTAGCCCTAACTCCGTGTTTAGCCAATGGAGAGTGGTCTGCGATAGCCTCGAGGATTACGATACCCTCGGCACACTGTGTAACTCC 

MCIR #2 

LNSTPTAI PQIiGLAANQTGARCIiEVSISDG 
CTGAATAGCACACCCACAGCCATTCCCCAACTGGGACTGGCTGCCAATCAGACAGGCGCTAGGTGTCTGGAAGTGTCCATCTCCGACGGA 

Tyros #28 

HRPLQEVYPEANAPIGHNRESYMVPFIPIiY 
CAOiGACCCCTCCSVGGAAGTGTATCCCGAAGCCAATGCCCCTATCGGACACAATAGGGAAAGCTATATGGTCCCCT^ 

gplOO #24 

EPSGTTSVQVPTTEVISTAP.VQMPTAESTG 
GAGCCTAGCGGAACCACAAGCGTCCAGGTCCCCACAACCGAAGTGATTAGCACAGCCCCTGTGCAAATGCCTACCGCTGAGTCCACCGGA 

TRP2 #11 

KKRVHPDYVlTTQHWIiGLLGPNGTQPQFAN 
AAGAAAAGGGTCCACCCTGACra^TGTXSATTACCaCACAGCATTGGOT 

gplOO #38 

LHQlIiKGGSGTYCIiiXVSIiADTNSLAVVSTQ 
CTGCATCS^GATTCTGAAAGGCGGAAGOSGAACOTATTGCCTCAACGTCAGCCTCXSCCXSA 

gplOO #30 

PEPEGPDASSIMSTBSITGSIiGPLIjDGTAT 
CCCGAACCCGAAGGCCCTGACGCTAGCTCCSVTCATGAGCACAaAGTCCATCACAGGCTCCCTG 

gplOO #31 

SITGSIiGPriLDGTATIiRLVKRQVPIjDCVIliY 
AGCATTACCGGAAGCCTCGGCCCTCTGCTCGACGGAACCGCTACCCTCyiGGCTCGTGAAAAGGaU^GTGCCTCTGGATTGCGTCCrGTAT 

gplOO #5 

DCWRGGQVSriKVSNDGPTLIGANASFSIAL 
GACTGTTGGAGAGGCGGACAGGTCAGCCTCAAGGTCAGCAATGACGGACCCACACXGATrGGCGCTAACGCTAGCTTTAGCATTGCCCTC 

Synthetic Protein: 



WNRQLYPEWTEAQRLDCWRGGQVSLKVSNDPYILRlIQDDRELWPRKPPHRTCKCTGNFAGRNGDFFISSKDriGYDY 

HRYHLLRLEKDMQEMLQEPSFSGHWRESYMVPPIPIiYRNGDFFISSKI)IiGYDI*LCriERDLQRLXGNESPALPYWNFATGRNBTT^ 

PSGTTSVQVPTTEVSTDYYQELQRDISEMPLQIYKQGGFLGLSNACMEXSSPGCQPPAQRLCQPVIiPSPACQLVDQLGYSYAlDLPVSVEETPGW 

LLVVMGTEDGPIRRNPAGirVARPlWQRLPEPQDVAQCMTVDSLWKECCPRLGAESANVCGSQ 

MS PLWWGFIiLS CLGCKIIjPGAQGQFPRVADIjS YTWDFGDS SGTLI SRALVVTHTYIjEPriAEMSTPEArGMTPAEVS I WLS6TTAAQVXKPRPGSVW 
QLTLAFREGTINVHDVETQFGSAATWGQDVTSVPVTRPALGSTTPPAHDVLHKRQRPVHQGFGIjKGAVTLTILLGIFFLCIiALIIC^ 
SQELRRTLKEVLKFFHRTCKCTGNFAGYNCGDCKFGWTGPNCLSIiQKFDNPPFFQNSTFSFRNALiSGFDKADSKSTPFSIPSHHSDTPTTriASHSTKT 
DASSAANRPAIiGSTAPPVHlSnnrSASGSASGSASTCNGTYEGIjljRRNQMGRNSMKIiPTLKDIRDCTHHSSVPPL^ 
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YDHVAVLLCLVVT'FLAMljVI.MAVLYVKLTGDENFTIPyWDWRDAEKCD^ 

LSNIKPRPGSVWQLTIlAVIDVITCSS^mSSLCFLGAIAVDRYISIFYRNPGIraDKSRTPRX.PSSOT 

PIGHmQYNMVSIiADTNSIiAWSTQblMPGQEAGLGQVPLGPVTAQWLQAAIPLTSCGSSPVPGTTDGHKFGPWGPNCTERRL 
DKLGTHTMEVTVYHRRGSRSYVPLMJSSSAPTAVAAIWVASWPSTLPIAYYDHVAVLL^ 
RRNGYRALMDKSLHVGTQCALTRRPPraRLFLLRWEQElQKLTGDENFTIPYWDWAAI^VQGSQRRLLGS 
MYCFICCIiALSDLLVSQSSMHNiiliHIYMNGTMSQVQGSAN 

AVILLLEADPTLISRNSRFSSWETVCDSLDDYNHLVTIiTRPALGSTTPPAHDVTSAPDNKAARDAEKCDICTDEYMGGQHPTNPN^ 

HDPSGYIjAEADIiSYTWDFGDS SGTS S3^VEFCXjSIjTQYE SGSMDKAANF SFRHTGPTl.XG?iNASl? S 1 AIjNFPGSQKVLPDGQVI^GPFFLHIiTLIVXiC 

PEHPTCGCIFKNFNLFCQCSGWFMGFNCGNCKFGFWGPNCTERRLLQYRRLRKGYTPLMETHLSSKRYTEEAAAPLENAPIGHNRQYNIW 

TEMFVTNFPGSQKVLPDGQVIWVmTIINGSQVWGGRPTAEAPNTTAGQVPTTEWGTTPGQAPTASTPGGEKETSATQRSSVPSSTEKNAVSMTSM 

YRRRLMKQDFSVPQLPHSSSHWIjRLPRILGLLGPNGTQPQFANCSVYDFFWLHYYSVCLEVGLFDTPPFYSNSTNSFRNTVEGYSDPA^ 

LLHIAVIGALIAVGATKVPRNQTGARCLEVSISDGLFIjSLGLVSLVENALSGSMDKAANFSFRNTLEGFASPLTGIA^ 

NAPLGPQFPFTGMHYYVSmALLGGSEIWRDIDFAHEAPAFIiEEKQPLLMEKEDYHSLYQSHLAAGQCTEVRADTRPWSG 

SSTEKNAVSMTSSVLSSHSPGSGSSTTTPMF]^INIYDIJFW!yIHYYVS^mAIJLGGSEQPVYPQETDDACIFPDGGPCPSGSWSQKRS 

LCNSTEDGPIRRNPAGNVAWWNTIINGSQWGGQPVYPQETDDACIFPQEKNCEPWPNAPPAYEKLSAEQSPPPYSPSRSYVPI^ 

VPF SVSVSQIjRLEKDMQEMLQEP S FSLPYWNFATGKNVCD I VPPWPPVTNTEMPVTAPDNLGYTYEAACS vydffvwlhyysvrdtllgpgrpyraid 

VRRNXFDLSAPEKDKFFAYLTIiAKHTISSDKKGHGHSYTTAEEAAGlGILTVIIiGVLLDIPVYWKTWGQYW 

igtgramlgthtmevtvyhrrgistapvqmptaestgmtpekvpvsevmgttfsswqivcsrleeynshqslcngtpegpbrdpifvv^ 
ewmkrfnppadawphmlaracqhaqgiariihkrqrpvhqgfgrikriltvltvvtgsghasstpgg^ 

RNALEGFDKADGTH^SQVMSIiKNIiVHSHQSLCaiGTPEGPIiRRNPGmDKSRTPR^ 
HSLSPQERBQFLGALDIAQEIiAPIGHNRMYNMVPFFPPVTNBELPLTSEVSIWIiSGOT 
LQFNSSnEDPYHTHGRYVPPSSTDRSPYEKVSAGNGGSSIiIiFLSIKSLVSLVENALWATIAKNRNLHSPM 
IPAVCQaUtKNYGQLDIFPARDTYHPMSEYPTOTFLAMIiVLI^VIiY^ 

ATGKNVCDICTDDIjMGSRSNPDSTITDQVPFSVSVSQLRAIjDGGNKHFLRNQPLFHISNLQFNSSLEDPSTDYYQELQRDISEM 
LSYTNPAVAAASANIAYVIPIGTYGQMKNGSTPMFNDINIYDLPWRLCQPVIiPSPACQLVLHQILKGGSGTYC]^ 

avpsgegdhhafvdsifeqwlqrhrplqbvypeanapictddlmgsrsnfdstlispnsvfsqwrwcfpardtyhpmseyptyhthgryvppss^^ 

sytnpavaaasaniaaehptcgcifknfnlfialiicnaiidpliymsqvqgsandpifllhhafvdsifeqwlqrrnsm 

nppffqnslisralwthtylepgpvtaqwlqaaiplsfalpywnfatgrmecdvctdqlfgaarpdvigaliavgatkvprnqdwlgvsrqlrtka 

ALDGGMKHFLRNQPLTFALQIiHDPSGYIAECDVCTDQIiFGAARPDDPTIilSRNSRFSSWEAAMPREDAHFIYGYPKKGHGHSYTTAEEAATGKY 
RSLHNriAHIiFLNGTGGQTHLSSSDVSVSDVPFPFSAQSGAGVPGWGIALLVLSPQBREQFLGALDIAKXRVHPDYVITTQHWFFAYLT^ 
VrPIGTYGQMKNGSGTNVLETAVILLLEAGALVARAAVXiQQLDNVCVLVAIiAIVYLIArAVCQCRRi^ 
PNAPPASVLSSHSPGSGSSTTQGQDVTIAPATEPASDEWMKRFNPPADAWPQEIAPlGHNRIvnn^ 

HRYHBLCIiERDLQRXjIGNERPMVQRLPEPQDVAQCLEVGLFDTPPFYSNQDPIFVLLHTFTDAVFDEWLRRYNADISTPLGAESANVCGSQQGRGQCT 

EVRADTRPWSGYNCGDCKFGWTGPNCERKKPPVIRQNIHSIjHIiFIjNGTGGQTHLSSQDPIFVLLHTPTDAVGIiVSL 

YHSGCKIIiPGAQGQFPRVCMTVDSLVNKECCPRQLPHSSSHWLRIiPRIFCSCPlGENSPLIiSGDGGPCPSGSWSQKRSPV^ 

GVSRQIiRTKAWNRQLYPEWTEAQRLIWRDIDFAHEAPAFLPWHRLFLLRWEQEIQAAMTPGTQSPFFLLLLLTVLWVTGSGHASDKSL^ 

RRCPQEGFDHRDSKVSIJm'SASGSASGSASTLVHNGTSAIWiTTTPALEGPASPLTGIMJASQSSMHWAI^ 

AFVTWHRYHAAMLLAVLYCLLWSFQTSAGHPPRACTSSKAPELWSCQGGLPKEACft^ 

CESAEILQAVPSGEGDAFEIiTVSCQGGLPKETVODSIjDDYiraiiVTIjCNGTYEGIjIiRM 

QSPPPYSPAAIVGILLVLMAWIiASLIYRRRIiMKQDPSVPGIGILTVILGVIjLLIGCM^ 

WLQGQDVTLAPATEPASGSAATWGQDVTSVPVTSCGSSPVPGTTDGHRPTAEAPNTTAGQVPLVHNGTSARATTTPASK^ 

PTTLLVVMGTLV?UJVGLFVLIiAPLTTTEWVETTAREIjPIPEPEGPDASSXMSTEGAVTLTIIjLGIFFLCWGPFFL^ 

AGLVSLLCRHKRKQIiPGALVARAAVLQQLDlTVIDVITCSSMIiSSLCMTPEKVPVSEVMGTTIiAEMSTPEAT^^ 

CPPWSGDRAIjRYHSlVTIjPRAPRAVAAIWASWFSTIjFREGTINVHDVETQFNQYKTEAASRYNLTINIiMEKECCPPWSG 

IKSYLEQASRlWSWIiliGAAlWGAVLTALIAPTTLASHSTKTDASSTHHSSVPPLTSSmiSSGAGVPGWGIA^ 

RYI S I FYALRYHSIVTLPRAPRLIMPGQEAGLGQVPLI VGILLVLMAVVIASTIiVALVGLFVIirAFLQYRRLRKGYTP 

SLEDYDTLGTLCNSIiNSTPTAIPQLGIA^^QTGARCLEVSISDGHRPLQEVYPEMIAPIGHNRESYWPFIPLYEPSGT 

TAESTGKKRVHPDYVITTQHWLGLLGPNGTQPQFANLHQXLKGGSGTYCLNVSIxADTNSLAWSTQPEPEGPDASSIMSTESITGSLGP 

TGSIjGPLLDGTATLRLVKRQVPIiDCVIjYDCWRGGQVSIjKV'SNDGPTIiIGANASF S lAL 

Syntlietic DNA: 



TGGAATAGGCAACTGTATCCCGI^TGGACAGAGGCTCAGAGACTGGATTGCTGGAGGGGAGGCCRAGTGTCCCTGAAAGTGTCa^CGAT 

cctcaggaatcaggatgacagagagctctggcctaggaaattctttcacagaacctgtaagtgtaccggaaactttgccggaaggaatggcgatttct 

TTATCTCCAGCA2^GACCTCGGCTATGACTATAGCTATCTGCAA6ACTCCGACCCTGACTCCTTCCAAGACTATGCCGCT^ 

cacagataccatctgctcaggctcgagaaagacatgcaggaaatgctccaggaaccctccttctccggccataacagagagtcctacatggt^ 

CATTCCCCTCTACAGAAACGGAGACTTTTTCATTAGCTCCMlGGATCTGGGATACGATCTGCTCTGCCTCGAGAGAGACCTCCAGAGACTGATTGGC^ 

ATGAGTCCTTCGCTCTGCCTTACTGGAACTTTGCa^CAGGCa^GAAACGAAACCACAGJySGTCGTGGG 

CCCTCCGGCAO^CCTCCGTGCAAGTGCCTACCSiCAGAGGTCAGmCAGACTATTACCaAGAGCTCCAGAGAGACAT^^ 

CTATAAGCAAGGCGGATTCCTCGGCCTC?i6CAATGCCTGTATGGAAATCTCCAGCCCTGGCTGTCAGCCTCCCGCTCAGAGACTGTGT^^^ 

TCCCCTCCCCCGCTTGCCAACTGGTCGACCAACXGGGATACTCCTACGCTATCGATCTGCCTGTGTCCGTGGAAGAGACACCCGGATGGCCTACCACA 

CTGCTCGTGGTCATGGGAACCGAAGACGGACCCATTAGGAGAAACCCTGCCGGAAACGTCGCCAGACCCATGGTGCAAAGGCTCCCCGAACCCCAAGA 

CGTCGCCCAATGCATGACCGTCGACTCCCTGGTCAACAAAGAGTGTTGCCCTAGGCTCGGCGCTGAGTCCGCCAATGTGTGTGGCTCCCAGCAAGGCA 

GAAACCAATACAAAACCGAAGCCGCTAGCAGATACAATCTGACAATCTCCGACGTCAGCGTCAGCGATGTGCCTTTCCCTTTCTCCGCCCAAGCCGCT 

ATGTCCCCCCTCTGGTGGGGCTTTCTGCTCAGCTGTCTGGGATGCAAAATCCTCCCCGGAGCCCAAGGCCAATTCCCTAGGGTCGCCGATCTGTCCTA 

CACATGGGATTTCGGAGACTCCAGCGGAACCCTCATCTCCAGGGCTCTGGTCGTGACACACACATACCTCGAGCCTCTGGCTGAGATGAGCACACCCG 

AAGCCACAGGCATGACCCCTGCCGAAGTGTCCATCGTCGTGCTCAGCGGAACCACAGCCGCTCAGGTCATCAAATTCAGACCCGGAAGCGTCGTGGT^ 

CAGCTCACCCTCGCCTTTAGGGAAGGCACAATCAATGTGCATGACGTCGAGACACAGTTTGGCTCCGCCGCTACCTGGGGCCAAGACGTCACCTCCGT 

GCCTGTGACAAGGCCTGCCCTCGGCTCCACCACACCCCCTGCCCA-TGACGTCCTGCATAAGAGACAGAGACCCGTCCACCAAGGCTTTGGCCTCAAGG 

GAGCCGTCACCCTCACCATTCTGCTCGGCATTTTCTTTCTGTGTCTGGCTCTGATTATCTGTAACGCTATCATTGACCCrCTGATTTACGCTTTCC^ 

AGCCAAGAGCTCAGGAGAACCCTCAAGGAAGTGCTCAAGTTTTTCCATAGGACaTGCAAATGCACAGGCAATTTCGCTGGCTA^^ 

CAT^TTCGGATGGACAGGCCCTAACTGTCTGTCCCTGCAAAAGTTTGACAATCCCCCTTTCTTTCAGAATAGCACATTCTCCTTCAGAAACGCTCT 
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AAGGCTTTGACAAAGCCGATAGCAAAAGCACACCCTTTAGCATTCCCTCCCACCATAGCGATACCCCTACCACACTGGCTAGCCATAGCACAAAGACA 

GACGCTAGCTCCGCCGCTAACAGACCCGCTCTGGGAAGCACAGCCCCTCCCGTCCACAATGTGACAAGCGCTAGCGGAAGCGCTAGCGGAAGCGCTAG 

CACATGCAATGGCACATACGAAGGCCTCCTGAGAAGGAATCAGATGGGCAGAAACTCCATGAAACTGCCTAGCCTCAAGGATATCAGAGACTGTACCC 

ATCACTCCAGCGTCCCCCCTCTGACAAGCTCCAACCATAGCACAAGCCCTCAGCTCAGCACAGGCGTCAGCTTTTTCTTTCTGTCCTTCATTGCCTAT 

TACGATCACGTCGCCGTCCTGCTCTGCCTCGTGGTCTTCTTTCTGGCTATGCTCGTGCTCATGGCTGTGCTCTACGTCAAGCTCACCGGAGACGAAAA 

CTTTACCATTCCCTATTGGGATTGGAGAGACGCTGAGAAATGCGATATCTGTACCGATGAGTATATGGGACTGAGACTGGTCAAGAGACAGGTCCCCC 

TCGACTGTGTGCTCTACAGATACGGAAGCTTTAGCGTCa%.CCCTCGACATTGTGCAAGGCATTTTCCTCCAGATTTACAAACAGGGAGGCTTTCTGGGA 

CTGTCCAACATTAAGTTTAGGCCTGGCTCCGTGGTCGTGCy^CTGACy^CTGGCTGTGATTGACGTa^TCACATGCTCa^G^ 

CTTTCTGGGAGCCATTGCCGTCGACAGATACATTAGCATTTTCTATAGGAATCCCGGAAACCATGACAAAAGCAGAACCCCTAGGCrCCCCT 

CTGACGTCGAGTTTTGCCTCAGCCTCACCCAATACGAATTCGATGAGTGGCTGAGAAGGTATAACGCTGACATTAGCACATTCCCTCI^^ 

CCCATTGGCCATAACAGACAGTATAACATGGTGTCCCTGGCTGACACAAACTCCCTGGCTGTGGTCAGCACACAGCTCSiTCATGCCCGGA^ 

CGGACTGGGACAGGTCCCCCTCGGCCCTGTGACAGCCCAAGTGGTCCTGCAAGCCGCTATCCCTCTGACAAGCTGTGGCTCCAGCCGTGTGCCTGGCA 

CAACCGATGGCCATAAGTTTGGCTTTTGGGGACCCAATTGCACAGAGAGAAGGCTCCTGGTCAGGAGAAACATTTTCGATCTGTCCGCCCCTGAGAAA 

GACAAACTGGGAACCCATACCATGGAGGTCACCGTCTACCATAGGAGAGGCTCCAGGTCCTACGTCCCCCTCGCCCATAGCTCCAGCGCTTTCACAGC 

CGTCGCCGCTATCTGGGTGGCTAGCGTCGTGTTTAGCACACTGTTTATCGCTTACTATGACCATGTGGCTGTGCTCCTGTGTCTGGTCGGCACACTGG 

ATAGCCAAGTGATGAGCCTCCTiCMTCTGGTCCACTCCTTCCTCAACGGAACCAATGCCCTCCCCCATAGCGCTGCCAATGGCTGTTGGTATTGCAGA 

AGGAGAAACGGATACAGAGCCCTCATGGATAAGTCCCTGCATGTGGGAACCCAATGCGCTCTGACAAGGAGACCCTGGai 

GTGGGAGCAAGAGATTCAGAAACTGACAGGCGATGAGAATTTGACAATCCCTTACTGGGACTGGGCCGCTATGGCTGTGCAAGGCTCC^ 

TCCTGGGAAGCCTCAACTCCACCCCTACCGCTATCCCTCAGCTCGGCCTCGCCGCTGTGGTCGCCACAATCGCTAAGAATAGGAATCTGCATAGCCCT 

ATGTATTGCTTTATCTGTTGCCTCGCCCTCAGCGATCTGCTCGTGTCCCAGTCCAGCATGCACAATGCCCTCCACATTTACATGAACGGAACCATGAG 

CCAAGTGCAAGGCTCCGCCAATGACCCTATCTTTCTGCTCGGCCAACACCCTACCAATCCCAATCTGCTCAGCCCTGCCTCCTTCTTTAGCTCCTGGC 

AAATCGTCTGCTCCAGGGTCGAGGAATACAATTACTGTTTCATTTGCTGTCTGGCTCTGTCCGACCTCCTGGTCAGCGGAACCAATGTGCTCGAGACA 

GCCGTCATCCTCCTGCTCGAGGCTGACCCTACCCTCATCTCCAGGAATAGCAGATTCTCCAGCTGGGAGACAGTGTGTGACTCCCTGGATGACTATAA 

CCATCTGGTCACCCTCACCAGACCCGCTCTGGGAAGCACAACCCCTCCCGCTCACGATGTGACAAGCGCTCCCGATAACAAAGCCGCTAGGGATGCCG 

AAAAGTGTGACATTTGCACAGACGAATACATGGGCGGACAGCATCCCACAAACCCTAACCTCCTGTCCCCCGCTAGCTTTACCTTTGCCCTCCAGCTC 

CACBATCCCTCCGGCTATCTGGCTGAGGCTGACCTl^GCTATACCTGGGACTTTGGCGATAGCTCCGGCACAAGCTCCGCCGATGTGGAATTCTGTC^ 

GTCCCTGACACAGTATQAGTCCGGCTCCATGGATAAGGCTGCCAATTTCTCCTTCAGAAACACTiLGGCCCTACCCTCAT^^ 

CCATCGCrCTGAATTTCCCTGGCTCCCAGAAAGTGCTCCCCGATGGCCyVAGTGATTTGGGGACCCTTTTTCCTCCa.CCTCACCCTCATC6TCCTGTO 

CCCGAACACCCTAGCTGTGGCTGTATCTTTAAGAATTTCMTCTGTTTTGCCAATGCTCCGGCAATTTCATGGGCTTTAACTGTGGCAATTGCAAAT 

CGGATTCTGGGGCCCTAACTGTACCGAAAGGAGACTGCTCCAGTATAGGAGACTGAGAAAGGGATACACACCCCTCATGGAAACCCATCTGTCCAGCA 

AAAGGTATACCGAAGAGGCTGCCGCTCCCCTCGAGAATGCCCCTATCGGACACAATAGGCAATACAATATGGTCCCCTTTTGGCCTCCC6TCACCAAT 

ACCGAAATGTTTGTGACAAACTTTCCCGGAAGCCAAAAGGTCCTGCCTGACGGACAGGTCATCTGGGTGAATAACACAATCATTAACGGAAGCCAA6T 

GTGGGGCGGAAGGCCTACCGCTGAGGCTCCCAATACCACAGCCGGACAGGTCCCCACAACCGAAGTGGTCGGCACAACCCCTGGCCAAGCCCCTACCG 

CrAGCACACCCGGAGGCGAAAAGGAAACCTCCGCCACACAGAGAAGCTCCGTGCCTAGCTCCACCGAAAAGAATGCCGTCAGCATGACCTCCCTGATT 

TACAGAAGGAGACTGATGAAGOUiGACTTTAGCGTCCCCCAACTGCCTCACTCCAGCTCCCACTGGCTGAGACTGCCTAGGATT 

CCCTAACGGT^CCCAACCCOVATTCGCTAACTGTAGCGTCTACGATTTCTTTGTGTGGCTGCATTACTATAGCGTCTGCCTCGAGGTCGGCCTCTTCG 

ATACCCCTCCCTTTTACTCCUyiCTCCy^CCAATAGCTTTAGGAATACCGTCGAQGQATACTCCGACCCTGCCGCTATGGATCTGGTCCTGAAAAGGTG 

CTGCTCCACCTCGCC6TCATCGGAGCCCTCCTGGCTGTGGGAGCCACAAAGGTCCCCAGAAACCAAACCGGAGCCAGATGCCTCGAGGTCAGCATTAG 

CGATGGCCTCTTCCTCAGCCTCGGCCTCGTGTCCCTGGTCGAGAATGCCCTCAGCGGAAGCATGGACAAAGCCGCTAACTTTAGC^^ 

TCGAGGGATTCGCTAGCCCTCTGACAGGCATTGCCGATGCCTCCAGCCCTTGCGGACAGCTCAGCGGAAGGGGAAGCTGTCAGAATATCCTCCTGTCC 

AACGCTCCCCTCGGCCCTCAGTTTCCCTTTACCGGAATGCATTACTATGTGTCCATGGATGCCCTCCTGGGAGGCTCCGAGATTTGGAGAGACATTGA 

CTTTGCCCATGAGGCTCCCGCTTTCCTCGAGGAAAAGCAACCCCTCCTGATGGAGAAAGAGGATTACCATAGCCTCTACCTUiAGCCATCTGGCTGCCG 

GCCAATGCACAGAGGTCAGGGCTGACACAAGGCCTTGGTCCGGCCCTTACATTCTGAGAAACCAAGACGATAGGGAACTGTGGCCCAGAAGCGTCCCC 

TC<^GCACAGAGAAAAACGCTGTOTCCATGAC3^GCTCCGTGCTCAGCTCCCACTCCCCCGGAAGCGGAAGCTCCACCACAACCCCTATGTTTi^ 

TATCAATATCTATGACCTCTTCGTCTGGATGCACTATTAC6TCAGCATGGACGCTCTGCTCGGCGGAAGCGAACAGCCTGTGTATCCCCAAGAGACAG 

ACGATGCCTGTATCTTTCCCGATGGCGGACCCTGTCCCTCCGGCTCCTGGTCCCAQAAAAGGTCCGACTCCCTGGAAGACTATGACACACrGGGAACC 

CTCTGCAATAGCACAGAGGATGGCCCTATCAGAAGGAATCCCGCTGGaiATGTGGCTTGGGTCM.CAATACCaT 

AGGCCAACCCGTCTACCCTCAGGAAACCGATGACGCTTGCATTTTCCCTCAGGAAAAGAATTGCGAACCCGTCGTGCCTAACGCTCCCCCTGCCTATG 

AGAAACTGTCCGCCGAACAGTCCCCCCCTCCCTATAGCCCTAGCAGAAGCTATGTGCCTCTGGCTCACTCCAGCTCCGCCTTTACCATTACCGATCAG 

GTCCCCTTTAGCGTCAGCGTCAGCCAACTGAGAGTGGAAAAGGATATGCAAGAGATGCTGCAAGAGCCTAGCTTTAGCCTCCCCTATTGGAATTTCGC 

TACCGGAAAGAATGTGTGTGACATTGTGCCTTTCTGGCCCCCTGTGACAAACACAGAGATGTTCGTCACCGCTCCCGATAACCTCGGCTATACCTATG 

AGGCTGCCTGCTCCGTGTATGACTTTTTCGTCTGGCTCCACTATTACTCCGTGAGAGACACACTGCTCGGCCCTGGCA6ACCCTATAGGGCTATCGAT 

GTGAGAAGGAATATCTTTGACCTCAGCGCTCCCGAAAAGGATAAGTTTTTCGCTTACCTCACCCTCGCCAAACACACAATCTCCAGCGATAAGAAAG^ 

CCATGGCCATAGCTATACCACAGCCGAAGAGGCTGCCGGAATCGGAATCCTCACCGTCATCCTCGGCGTCCTGCTCCTGATTTTCGTCTACGTCTGGA 

AAACCTGGGGCCAATACTGGCAGGTCCTGGGAGGCCCTGTGTCCGGCCTCAGCT^TTGGCAaiGGCAGAGCCATGGGCGGACCC^ 

ATCGGAACCGGAAGGGCTATGCTCGGCACaCSiCACAATGGAAGTGACAGTGTATCaCAGAAGGGGAATCTCCaiCCGCTCCCGTCCAGATGC 

CGAAAGCACaGGCaTGACCCCTGAGAAAGTGCCTGTGTCCGAGGTCaTGGGAACCACATTCTCCy^GCTGGCAGATTGTQTGTAGa^ 

ATAACTCCCACCAAAGCCTCrraCAATGGCACACCCGAAGGCCCTCTGAGAGACCCTATCTTTGTGGTCCTGCATAGCTT^ 

GAGTGGATGAAAAGGTTTAACCCTCCCGCTGACGCTTGGCCTCACATGCTGGCTAGGGCTTGCCAACACGCTCAGGGAATCGCTAGGCTCGAOU^ 

GCAAAGGCCTGTGCATCAGGGATTCGGACTGAAACTGCTCACCGTCCTGACAGTGGTCACCGGAAGCGGACACGCTAGCTCCACCCCTGGCGGAGAGA 

AAGAGACAAGCGCTACCCAAAGGTCCTTCTGTAGCTGTCCCATTGGCGAAAACTCCCCCCTCCTGTCCGGCCAACAGGTCGCCGCTACCTTTAGCTTT 

AGGAATGCCCTCGAGGGATTCGATAAGGCTGACGGAACCCTCGACTCCCAGGTCATGTCCCTGCATAACCTCGTGCATAGCCATCAGTCCCTGTGTAA 

CGGAACCCCTGAGGGACCCCTCAGGAGAAACCCTGGCAATCACGATAAGTCCAGGACACCCAGACTGCCTCCCTTTTTCCCTCCCGTCACCAATGAGG 

AACTGIOTCTGACS^AGCGATCAGCTCGGCTATAGCTATGCCATTGACCTCCCCGTCAGCGTCGAGAGAAAGA^ 

O^CTCC CreT CCCCCCAAGAGAGAGAGCAATTCCTCGGCGCTCTGGATCTGGCTCAGGAACTGGCTCCCATTGGCCATAAC^ 

QCCTTTCOTTCCCCCTGTGACAAACKSAAGAGCTCTTCCTCACCTCCGAGGTCAGCATTGTGGTCCTGTCCGGC^ 

CAGAGTGGGTGGAAACCACAGCCA6AGAGCTCCCCATTACCTCCCCCCAACTGTCCACCGGAGTGTCCTTCTTTTTCCTCA6CT 

CTGCAATTCAATAGCTCCCTGGAAGACCCTTACCaTACCCATGGCAGATACGTCCCCCCTAGCTC(^CCGATAGGTCCCCCTATGAGAAA6TGTC^^ 

CGGAAACGGAGGCTCCAGCCTCCTGTTTCTGTCCCTGGGACTGGTCAGCCTCGTGGAAAACGCTCTGGTCGTGGCTACCATTGCCAAAAACAGAAACC 

TCCACTCCCCCATGAGCTTTCT6AATGGCACAAACGCTCTGCCTCACTCCGCCGCTAACGATCCCATTTTCGTCGTGCTCCACTCCTTCACAGACGCT 

ATCTTTGCCGTCTGCCAATGCAGAAGGAAAAACrATGGCCAACTGGATATCTTTCCCGCTAGGGATACCTATCACCCTATGTCCGAGTATCCCACAGT 

GTTTTTCCTCGCCATGCTGGTCCTGATGGCCGTCCTGTATGTGCATATGCTCGCCAGAGCCTGTCAGCATGCCCAAGGCATTGCCAGAAGCACAAACT 
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CCTTCAGAAACACAGTGGAAGGCTATAGCGATCCCACAGGCAAATACGATCCCGCTGTGAGAAGCCTCCACAATCTGGCTCTGCCTTACTGGAACTTT 
GCCACAGGCAAAAACGTCTGCGATATCTGTACCGATGACCTCATGGGAAGCAGAAGCAATTTCGATAGCACAATCACAGACCAAGTGCCTTTCTCCGT 
GTCCGTGTCCCAGCTCAGGGCTCTGGATGGCGGAAACAAACACTTTCTGAGAAACCAACCCCTCTTCCATATCTCCAACCTCCAGTTTAACTCCAGCC 
TCGAGGATCCCTCCACCGATTACTATCAGGAACTGCAAAGGGATATCTCCGAGATGAGCCCTTACGAAAAGGTCAGCGCTGGCAATGGCGGAAGCTCC 
CTGTCCTACACAAACCCTGCCGTCGCCGCTGCCTCCGCCAATCTGGCTTACGTCATCCCTATCGGAACCTATGGCCAAATGAAAAACGGAAGCACACC 

TCAAGGGAGOCTCCGGCT^CATACTGTCTGAATAGGTATGGCTCCTTCTCCGTGACACrGGATATCGTCCAGGGAATCGAM 
GCCGTCCCCTCCGGCGAAGGCGATCACCS^TGCCITTGTGGATAGCATTTT^^ 

GGCTAACGCTCCCATTTGCACAGACGATCTGATGGGCTCCAGGTaZAACrTTQACTCCACCCTCATCTCCCCCAATAGCGTCTTCTCCC^ 

TCGTGTGTTTCCCTGCCAGAGACACATACCATCCCATGAGCGAATACCCTACCTATCACACACACGGAAGGTATGTGCCTCCCTCCAGCACAGACAGA 

AGCTATACCAATCCCGCTGTGGCTGCCGCTAGCGCTAACCTCGCCGCTGAGCATCCCACATGCGGATGCATTTTCAAAAACTTTAACCTCTTCCTCGC 

CCTCATCATTTGCAATGCCATTATCGATCCCCTCATCTATATGTCCCAGGTCCAGGGAAGCGCTAACGATCCCATTTTCCTCGTGCATCT^CGCTTTCG 

TCGACTCCATCTTTGAGCAATGGCTCCAGAGAAGGAATAGCATGAAGCTCCCCACACTGAAAGACATTAGGGATTGCCTCAGCCTCCAGAAATTCGAT 

AACCCTCCCTTTTTCCAAAACTCCCTGATTAGCAGAGCCCTCGTGGTCACCCATACCTATCTGGAACCCGGACCCGTCACCGCTCAGGTCGTGCTCCA 

GGCTGCCATTCCCCTCAGCTTTGCCCTCCCCTATTG6AATTTCGCTACCGGAAGGAATGAGTGTGACGTCTGCACAGACCAACTGTTTGGCGCTGCCA 

QACCCGATGTGATTGGCGCTCT6CTCGCCGTCGGCGCTACCAAAGTGCCTAGGAATCAGGATTGGCTCGGCGTCAGCAGACAGCTCAGGACAAAGGCT 

GCCCTCGACGGAGGCAATAAGCATTTCCTCAGGAATCAGCCTCTGACATTCGCTCTGCAACTGa^TGACCCTAGCGGATACCTC^ 

GTGTACCGATCAGCTCTTCGGAGCCGCTAGGCCTGACGATCCCACACTGATTAGCAGAAACTCCAGGTTTAGCTCCTOGGAAGCCGCTATGCCTA^ 

AAGACGCTCACTTTATCTATGGCTATCCCAAAAAGGGACACGGACACTCCTACACAACCGCTGAGGAAGCCGCTACCGGAAAGTATGACCCTGCCGTC 

CTTTAGCGCTCAGTCCGGCGCTGGCGTCCCCGGATGGGGAATCGCTCTGCTCGTGCTCAGCCCTCAGGAAAGGGAACAGTTTCTGGGAGCCCTCGACC 

TCGCCAAAAAGAGAGTGCATCCCGATTACGTCATCACAACCC?^CACTGGTTCTTTGCCTATCTGAC7iCTGGCTAAGCATACC^^^ 

GTGATTCCCATTGGCACATACGGACAGATG2Ui.GAATGGCTCCGGCACAAACGTCCTGGAAACCGCTGTGATTCTGCTCCTGGAAGCCGGAGCCCXCGT 

GGCTAGGGCTGCCGTCCTGCAACAGCTCGACAATGTGTGTGTGCTCGTGGCTCTGGCTATCGTCTACCTCATCGCTCTGGCTGTGTGTCAGTGTAGGA 

6AAAGAATTACGGACAGCTCGACATTTGCCCTCAGGAAGGCTTTGACCATAGGGATAGGAAAGTGTCCCTGCAAGAGAAAAACTGTGAGCCTGTGGTC 

CCCAATGCCCCTCCCGCTAGCGTCCTGTCCAGCCATAGCCCTGGCTCCGGCTCCAGCAakACCCyUiGGCCaiAGACGT 

GCCTGCCTCCGACGAATGGATGAAGAGATTCAATCCCCCTGCCGATGCCTGGCCCCAAGAGCTCGCCCCTATCGGACACAATAGGATGTACAATATGG 

TCGCCTTTCACTCCCAGGAACTGAGAAGGACACTGAAAGAGGTCCTGACATGCTCCTGGGCTGCCTTCTCCOiCCAAGGCCCrGCCTTTGTGACA 

CATAGGTATCACCTCCTGTGTCTGGAAAGGGATCTGCAAAGGCTCATCGGAAACGAAAGGCCTATGGTCCAGAGACTGCCTGAGCCTCAGGATGTGGC 

TCAGTGTCTGGAAGTGGGACTGTTTGACACACCCCCTTTCTATAGCAATCAGGATCCCATTTTCGTCCTGCTCCACACATTCACAGACGCTGTGTTTG 

ACGAATGGCTCAGGAGATACAATGCCGATATCTCCACCTTTCTGGGAGCCGAAAGCGCTAACGTCTGCGGAAGCCAACAGGGAAGGGGACAGTGTACC 

GAAGTGAGAGCCGATACCAGACCCTGGAGCGGATACAATTGCGGAGACTGTAAGTTTGGCTGGACCGGACCCAATTGCGAAAGGAAAAAGCCTCCCGT 

CATCAGACAGAATATCCATAGCCTCCACCTCTTCCTCAACGGAACCGGAGGCCAAACCCATCTGTCCAGCCAAGACCCTATCTTTGTGCTCCTGCATA 

CCTTTACCGATGCCGTCGGCCTCGTGTCCCTGCTCTGCAGACACAAAAGGAAACAGCTCCCCGAAGAGAAACAGCCTCTGCTCATGGAAAAGGAAGAC 

TATCaCTCCGGCTGTAAGATTCTGCCTGGCGCTCAGGGACAGTTTCCCAGAGTGTGTATGACAGTGGATAGCCTCGTGAATAAG 

AC:aGCrCCCCCATAGCrCC]AGCCS\TTGGCTCAGGCTCCCCyVGAATCTTTTGCTCCTGCCCTATCGGAGAG^ 

GCCCTTGCGCTAGOSGAAGCTGGAGCCAAAAGAGAAGCTTTGTGTATGTGTGGAAGACaTGGGGAC^ 

GGAGTGTCCAGGCMCTGAGAACCAAAGCCTGGAACAGACAGCTCTACCCTGAGTGGACCGAAGCCCAAAGGCTCATCTG6AGG 

TCACGAAGCCCCTGCCTTTCTGCCTTGGCATAGGCTCTTCCTCCTGAGATGGGAACAGGAAATCCAAGCCGCTATGACACCCGGAACCCAAAGCCCTT 

TCTTTCTGCTCCTGCTCCTGACAGTGCTCACCGTCGTGACAGGCTCCGGCCATGCCTCCGACffiJ^J^GCCTCCACGTCGGCACACAGTG 

AGAAGGTGTCCCCAAGAGGGATTCGATCACAGAGACTCCAAGGTCAGCCTCAACGTCACCTCCGCCTCCGGCTCCGCCTCCGGCTCCGCCTCCACCCT 

CGTGCATAACGGAACCTCCGCCAGAGCCACAACCACACCCGCTCTGGAAGGCTTTGCCTCCCCCCTCACCGGAATCGCTGACGCTAGCCAAAGCTCCA 

TGCATAACGCTCTGCATATCTATATGAATGGCACAAGGGATACCCTCCTGGGACCCGGAAGGCCTTACAGAGCCATTGACTTTAGCCATCAGGGACCC 

GCTTTCGTCaCCTGGOVCAGATACCATGCCGCTATGCTCCTGGCTGTGCTCTACTGTCTGCTCTGGTCCTTCCAAACCTCCGCCGGACACTTTCCCAG 

AGCCTGTGTGTCCAGCAAAGCClTTGAGCTCACCGTCftGCTGTCAGGGAGGCCTCCCCAAAGAGGCTTGCATGGA^^ 

CCCCTGCCCyUiGTGGATGACaGAGAGTCCTGGCCTAGCGTCTTCTATAACAGAACCT3TCaGTGTAGCGGAAACTTTATGGGATTC 

TGTGAGTCCGCCGAAATCCTCCAGGCTGTGCOTAGCGGAGAGGGAGACGCTTTCGAACTGACAGTGTCCTGCCAAGGCGGACTGCCTAAGG 

CTGCGATAGCCTCGACGATTACAATCACCTCGTGACACTGTGTAACGGAACCTATGAGGGACTGCTCAGGAGZUUVCCAAATGGGACTGCTCAGCAATG 

CCCCTCTGGGACCCCAATTCCCTTTCACAGGCGTCGACGATAGGGAAAGCTGGCCCTCCGTGTTTTACaATAGGACyiTACGAAAAG^^ 

CAAAGCCCTCCCCCTTACTCCCCCGCTGCCATCGTGGGCATTCTGCTCGTGCTCATGGCTGTGGTCCTGGCTAGCCTCATCTATAGGAGAAGGCTCAT 

GAAACAGGATTTCTCCGTGCCTGGCATTGGCATTCTGACAGTGATTCTGGGAGTGCTCCTGCTCATCGGATGCTGGTACTGTAGGAGAAGGAATGGCT 

ATAGGGCTCTGATGTACrCCTACCTCCAGGATAGCGATCCCGATAGCTTTCAGGATTACATTAAGTCCTACCTCGAGCAAGCCTCCAGGATTTGGTCC 

TGGCTCCAGGGACAGGATGTGACACTGGCTCCCGCTACCGAACCCGCTAGCGGAAGCGCTGCCACATGGGGACAGGATGTGACAAGCGTCCCCGTCAC 

CTCCTGCGGAAGCTCGCCCGTCCCCGGAACCACAGACGGACACAGACCCACAGCCGAAGCCCCTAACACAACCGCTGGCCAAGTGCCTCTGGTCCACA 

ATGGCACyU^GCGCTAGGGCTACCa^akACCCCTGCCTCCAAGTCCACCCCTTTCrCCATC^ 

CCCACAACCCTCCTGGTCGTGATGGGCACACTGGTCGCCCTCGTGGGACTGTTrGTGCTCCTGGCTTTCCTCACCAC^ 

CGCTAGGGAACTGCCTATCCCTGAGCCTGAGGGACCCXSATGCCrCCSUKSVTTATGTCCACCGAAGGCGCTGTGAC^ 

TTTTCCTCTGCTGGGGCCCTTTCTOTCTGCATCTGACACTGATTGTGCTCTGCCCrCTGGGAGCCXSCTATGGTCGGCGCTGTGC^ 

GCCGGACTGGTCAGCCTCCTGTGTAGGCATAAGAGAAAGCAACTGCCTGGCGCTCTGGTCGCCAGAGCCGCTGTGCTCCAGCAACTGGATAACGTGAT 

CGATGTGATTACCTGTAGCTCCATGCTCAGCTCCCTGTGTATGACACCCGAAAAGGTCCCCGTCAGCGAAGTGATGGGCACAACCCTCGCCGAAATGT 

CCACCCCTGAGGCTACCGGAATGACACCCGCTCAGACAAGCGCTGGCCATTTCCCTAGGGCTTGCGTCAGCTCCAAGAATCTGATGGAGAAAGAGTGT 

TGCCCTCCCTGGAGCGGAGACAGAGCCCTCAGGTATCACTCCATCGTCACCCTCCCCAGAGCCCCTAGGGCTGTGGCTGCCATTTGGGTCGCCTCCGT 

GGTCTTCTCCACCCTCTTCAGAGAGGGAACCATTAACGTCCACGATGTGGAAACCCAATTCAATCAGTATAAGACAGAGGCTGCCTCCAGGTATAACC 

TCaCCATTAACCTCATGGAAAAGQAATGCTGTCCCCCTTGGTCCGGCGATAGGTC C CC CTGTGG CCi^CTGTCCGGCAGAGG CTCC TGCCAAAAC ATT 

ATCAAAAGCTATCTGGAACAGGCTAGCAGAATCTGGAGCTGGCTGCTCGGCGCTGCCATGGTGGGAGCCGTCCTGACAGCCCTCCTGGCTCCCACAAC 

CCTCGCCTCCCACTCCACCAAAACCGATGCCTCCAGCACacyi.CCATAGCTCCGTGCCTCCCCTCACCTCCaGCAATC^ 

CTGGCTGGGGCATTGCCCTCCrrGGTCCTGGTCTGCGTCCTGGTCGCCCTCGCCATTGTGTATCTGATTGCCCTCn^ 

AGGTATATCTCCATCTTTTACGCTCTGAGATACCATAGCATTGTGACACTGCCTAGGGCTCCCAGACTGATTATGCCTGGCCAAGAGGCTGGCCTCGG 
CCAAGTGCCTCTGATTGTGGGAATCCTCCTGGTCCTGATGGCCGTCGTGCTCGCCTCCACCCTCGTGGCTCTGGTCGGCCTCTTCGTCCTGCTCGCCT 
TTCTGCAATACAGAAGGCTCAGGAAAGGCTATACCCCTCTGATG6AGACACTGATTAGCCCTAACTGCGTGTTTAGGCAATGGAGAGTGGTCTGCGAT 
AGCCTCGAGGATTACGATACCCTCGGCACACTGTGTAACTCCCTGAATAGCACACCCACAGCCATTCCCCAACTGGGACTGGCTGCCAATCAGACAGG 
CGCTAGGTGTCTGGAAGTGTCCATCTCCGACGGACACAGACCCCTCCAGGAAGTGTATCCCGAA6CCAATGCCCCTATCGGACACAATAGGGAAAGCT 
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ATATGGTCCCCTTTATCCCTCTGTATGAGCCTAGCGGAACCACAAGCGTCCAGGTCCCCACAACCGAAGTGATTAGCACAGCCCCTGTGCAAATGCCT 
ACCGCTGAGTCCACCGGAAAGAAAAGGGTCCACCCTGACTATGTGATTACCACACAGCATTGGCTCGGCCTCCTGGGACCCAATGGCACACAGCCrCA 
GTTTGCCAATCTGCATCAGATTCTGAAAGGCGGAAGCGGAACCTATTGCCTCAACGTCAGCCTCGCCGATACCAATAGCCTCGCCGTCGTGTCCACCG 
AACCCGAACCCGAAGGCCCTGACGCTAGCTCCATCATGAGCACAGAGTCCATCACAGGCTCCCTGGGACCCCTCCTGGATGGCACAGCCACAAGCATT 
ACCGGAAGCCTCGGCCCTCTGCTCGACGGAACCGCTACCCTCAGGCTCGTGAAAAGGCAAGTGCCTCTGGATTGCGTCCTGTAT6ACTGTTGGAGAGG 
CGGAOVGGTCAGCCTC^^GGTCAGCAATGACGGACCCACaCTGATTGGCGCTAACGCTAGCTTTAGCAT^ 

Melanoma cancer Specific Savine Scramble process 

Scramble - Output File 

Scramble version : 0,1 beta, 08/02/1999 
Num. genes : 10 

Num. segments : 121 
Segment length : 30 
Segment overlap : 15 

Segments in original order: 



Gene : BAGE 

Segments : 1 
Offset : 1 
1st Codon : 1 

AAMAARAVFLALSAQDLiQARLMKE ESPVVS 
GCCGCTATGGCT6CCAGAGCCGTCTTCCTCGCCCTCAGCGCTCAGCTCCTGCAAGCCAGACTGATGAAGGAAGAGTCCCCCGTCGTGTCC 

Gene : BAGE 

Segment# : 2 
Offset : 16 

1st Codon : 1 

LliQARLMKEESPVVSWRLEPEDGTALCFI F 
CTGCTCCAGGCTAGGCTCATGAAAGAGGAAAGCCCTGTGGTCAGCTGGAGGCTCGAGCCTGAGGATGGCACAGCCCTCTGCTTTATCTTT 

Gene : BAGE 

Segment # : 3 
Offset : 31 

1st Codon : 1 

WRLEPEDGTAIiCPIFAA 
TGGAGACTGGAACCCGAAGACGGAACCGCTCTGTGTTTCATTTTC6CTGCC 

Gene : GAGB-1 

Segment # : 1 
Offset : 1 

1st Codon : 1 

AAMSWRGRSTYRPRPRRYVEPPEMIGPMRP 
GCCGCTATGTCCTGGAGAGGCAGAAGCACATACAGACCCAGACCCAGAAGGTATGTGGAACCCCCTGAGATGATCGGACCCATGAGGCCT 

Gene : GAGE-1 

Segment# : 2 
Offset : 16 

1st Codon : 1 

RRYVEPPEMIGPMRPEQFSDEVEPATPEEG 
AGGAGATACGTCGAGCCTCCCGAAATGATTGGCCCTATGAGACCCGAACAGTTTAGCGATGAGGTCGAGCCTGCCACACCCGAAGAGGGA 

Gene : GAGB-1 

Segment # : 3 

Offset : 31 

1st Codon : 1 

EQFSDEVBPATPEEGEPATQRQDPAAAQEG 
GAGCAATTCTCCGACGAAGTGGAACCCGCTACCCCTGAGGAAGGCGAACCCGCTACCCAAAGGCAAGACCCTGCCGCTGCCCAAGAGGGA 

Gene ; GAGB-1 

Segment# : .4 
Offset : 46 
1st Codon : 1 

EPATQRQDPAAAQEGEDEGASAGQGPKPEA 
GAGCCTGCCACACAGAGACAGGATCCC6CTGCCGCTCAGGAAGGCGAAGACGAAGGCGCTAGCGCTGGCCAAGGCCCTAAGCCTGAGGCT 

Gene : GAGE-1 

Segment # : 5 

Offset : 61 

1st Codon : 1 
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EDEGASAGQGPKPEADSQEQGHPQTGCKCE 
GAGGATGAGGGAGCCTCCGCCGGACAGGGACCCAAACCCGAAGeCGATAGCCAAGAGCAAGGCmTCCCCiWiACCGGATGCGAATGC 

Gene : GAGE-l 

Segment # : 6 
Offset : 76 

1st Codon : 1 

DSQEQGHPQTGCECEDGPDGQEMDPPNPEE 
GACTCCCAGGAACAGGGACACCCTCAGACAGGCTGTGAGTGTGAGGATGGCCCTGACGGACAGGAAATGGATCCCCCTAACCCTGAGGAA 

Gene : GAGE-1 

Segmenttt : 7 
Offset t 91 
1st Codon : 1 

DGPDGQEMDPPNPEEVKTPEEEMRSHYVAQ 
GACGGACCCGATGGCCAAGAGATGGACCCTCCCAATCCCGJ^AGAGGTCTIAGACACCCGAAGAGGAAATGAGJ^ 

Gene : GAGE-1 

Segment # : 8 
Offset : 106 
1st Codon : 1 

VKTPEEEMRSHYVAQTGILWIiIiMNNCPIiNIi 
GTGAAAACCCCTGAGGAAGA6ATGAGGTCCCACTATGTGGCTCAGACAGGCATTCTGTG6CTGCTCATGAATAACTGTTTCCTCAACCTC 

Gene : GAGE-1 

Segment # : 9 
Offset : 121 
1st Codon : 1 

TGILWIiliMNNCFIiNLSPRKPAA 
ACCGGAATCCTCTGGCTCCTGATGAACAATTGCTTTCTGAATCTGTCCCCCAGAAAGCCTGCCGCT 

Gene : gplOOIn4 

Segments : 1 
Offset : 1 
1st Codon : 1 

AASWSQKRSFVY VWKTWGEGLPSQPIIHTC 
GCCGCTAGCTGGAGCCAAAAGAGAAGCTTT6T6TATGTGTGGAAGACATGGGGAGAGGGACTGCCTAGCCAACCCATTATCCATACCTGT 

Gene : gpl00i:n4 

Segments : 2 
Offset : 16 

1st Codon : 1 

TWGEGLPSQPIIHTCVYPPLPDHIiSPGRPF 
ACCTGGGGCGAAGGCCTCCCCTCCCAGCCTATCATTCACACATGCGTCTACTTTTTCCTCCCCGATCACCTCAGCTTTGGCAGACCCTTT 

Gene : gplOOIn4 

Segments : 3 
Offset : 31 
1st Codon : 1 

VYPFLPDHLSFGRPFHLNFCDFLAA 
GTGTATTTCTTTCTGCCTGACCATCTGTCCTTCGGAAG6CCTTTCCATCTGAATTTCTGTGACTTTCTGGCTGCC 

Gene : MAGE-1 

Segments : 1 
Offset : 1 

1st Codon : 1 

AAMSLEQRSLHCKPEEALEAQQEALGLVCV 
GCCGCTATGTCCCTGGAACAGAGAAGCCTCCACTGTAA6CCTGAGGAAGCCCTCGAGGCTCAGCAAGAGGCTCTGGGACTGGTCTGCGTC 

Gene : MAGE-1 

Segments : 2 
Offset : 16 
1st Codon : 1 

EAIiEAQQEALGIiVCVQAATSSSSPLVLGTL 
GAGGCTCTGGAAGCCCAACAGGAAGCCCTCGGCCTCGTGTGTGTGCAAGCCGCTACCTCCAGCTCCAGCCCTCTGGTCCT6GGAACCCTC 

Gene : MAGE-1 

Segments : 3 
Offset : 31 
1st Codon : 1 

QAATSSSSPIiVLGTLEEVPTAGSTDPPQSP 
CAGGCTGCCACAAGCTCCAGCTCCCCCCTCGTGCTCGGCACACTGGAAGAGGTCCCCACAGCCGGAAGCACAGACCCTCCCCAAAGCCCT 
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Gene : MAGE-1 

Segment # : 4 
Offset : 46 
1st Codon : 1 

EEVPTAGSTDPPQSPQGASAFPTTINFTRQ 
GAGGAAGTGCCTACCGGTGGCTCCACCGATCCCCCTCAGTCCCCCGAAGGCGCTAGCGCTTTCCCTACCACAATCAATTTCAC2UV.GGCAA 

Gene : MAGE-1 

Segment# : 5 
Offset : 61 
1st Codon : 1 

QGASAFPTTINFTRQRQPSEGSSSREEEGP 
CAGGGAGCCTCCGCCTTTCCCACAACCATTAACTTTACCAGACAGAGACAGCCTAGCGAAGGCTCCAGCTCC^ 

Gene : MAGE-1 

Segments : 6 
Offset : 76 
1st Codon : 1 

RQPSEGSSSREEEGPSTSCILESIiFRAVIT 
AGGCAACCCTCCGAGGGAAGCTCCAGCAGAGAGGAAGAGGGACCCTCCACCTCCTGCATTCTGGAAAGCCTCTTCA6AGCCGTCATCACA 

Gene : imGE-1 

Segments : 7 
Offset : 91 
1st Codon : 1 

STSCI LE SliFRAVITKKVADLVGFLIiLKYR 
AGCACAAGCTGTATCCTCGAGTCCCTGTTTAGGGGTGTGATTACCAAAAAGGTCGCCGATCTGGTCGGCTTTCTGCTCCTGAAATACAGA 

Gene : MAGE-1 

Segment^ : 8 
Offset ; 106 
1st Codon : 1 

KKVADLVGFLLIiKYRAREPVTKAEMLESVI 
AAGAAAGTGGCTGACCTCGTGG6ATTCCTCCTGCTCAAGTATAGGGCTAGGGAACCCGTCACCAAA6CCGAAATGCTCGAGTCCGTGATT 

Gene : MAGE-1 

Segments : 9 
Offset : 121 
1st Codon : 1 

AREPVTKAEMLESVIKNYKHCFPEIFGKAS 
GCCAGAGAGCCTGTGACAAAGGCTGAGATGCTGGAAAGC6TCATCAAAAACTATAAGCATTGCTTTCCCGAAATCTTTGGCAAAGCCTCC 

Gene : MAGE-1 

Segments : 10 
Offset : 136 
1st Codon : 1 

KNYKH C F P E I FGKASES LQDV-FGI DVKEAD 
AAGAATTACAAACACTGTTTCCCTGAGATTTTCGGAAAGGCTAGCGAAAGCCTCCAGCTCGTGTTTGGCATTGACGTCAAGGAAGCCGAT 

Gene : MAGE-1 

Segments : 11 
Offset : 151 
1st Codon : 1 

ESLQLVFGIDVKEADPTGHSYVLVTCLGIiS 
GAGTCCCTGCAACTGGTCTTCGGAATCGAT6TGAAAGAGGCTGACCCTACCGGACACTCCTACGTCCTGGTCACCTGTCTGG6ACTGTCC 

Gene : MAGE-1 

Segments : 12 
Offset : 166 
1st Codon : 1 

PTGHS YVLVTCLGLSYDGLliGDNQIMPKTG 
CCCACAGGCCATAGCTATGTGCTC6TGACATGCCTCGGCCTCA6CTATGACGGACTGCTCGGCGATAACCAAATCATGCCCAAAACCGGA 

Gene : MAGE-1 

Segments : 13 
Offset : 181 
1st Codon : 1 

YDGLLGDNQIMPKTGFLIIVIiVMIAMEGGH 
TACGATGGCCTCCTGGGAGACAATCAGATTATGCCTAAGACAGGCTTTCTGATTATCGTCCTGGTCATGATTGCCATGGAGGGAGGCCAT 

Gene : MAGE-1 
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Segment # : 14 
Offset : 196 
1st Codon : 1 

FLIIVLVMIAMEGGHAPEEEIWBEIiSVMEV 
TTCCTCATCATTGTGCTCGTGATGATCGCTATGGAAGGCGGA.CACGCTCCCGAAGAGGAAA.TCTGGGAGGAACTGTGCGTGATGGAGGTC 

Gene : MAGE-1 

Segment # : 15 
Offset : 211 

Ist Codon : 1 

APEEEIWEELSVMEVYDGREHSAYGEPRKIi 
GCCCCTGAGGAAGAGATTTGGGAAGAGCTCAGCGTCATGGAAGTGTATGACGGAAGGGAACACTCCGCCTATGGCGAACCCAGAAAGCTC 

Gene : MAGE-1 

Segments : 16 
Offset : 226 

1st Codon : 1 

YDGREHSAYGEPRKLLTQDIiVQEKYDEYRQ 
TACGATGGCS^GAGAGCATAGCGCTTACGGAGAGCCTAGGAAACTGCTCACCCAAGACCTCGTGCAAGAGAAATACCTCGAGTATAGGCAA 

Gene : MAGE-1 

Segments : 17 
Offset : 241 

1st Codon : 1 

LTQDLVQEKYLEYRQVPDSDPARYEFLWGP 
CTGACACAGGATCTGGTCCAGGAAAAGTATCTGGAATACAGACAGGTCCCCGATAGCGATCCCGCTAGGTATGAGTTTCTGTGGGGCCCT 

Gene : MAGE-1 

Segments : 18 
Offset : 256 
1st Codon : 1 

VPDSD PARYEFLWGPRAIiAETSYVKVIiEYV 
6TGCCTGACTCCGACCCTGCCAGATACGAATTCCTCTGGGGACCCAGAGCCCTCGCCGAAACCTCCTACGTCAAGGTCCTGGAATACGTC 

Gene : MAGE- 1 

Segments : 19 
Offset : 271 
1st Codon : 1 

RAIiA ET SYVKVLEYVIKVSARVRFFFPSLR 
AGGGCTCTGGCTGAGACAAGCTATGT6AAAGT6CTCGAGTATGTGATTAAGGTCAGCGCTAGGGTCAGGTTTTTCTTTCCCTCCCTGAGA 

Gene ; MAGE-1 

Segments : 20 
Offset : 286 
1st Codon : 1 

IKVSARVRFFFPSIiREAAIjREEEEGVAA 
ATCAAAGTGTCCGCCAGAGTGAGATTCTTTTTCCCTAGCCTCAGGGAAGCCGCTCTGAGAGAGGAAGAGGAAGGCGTCGCCGCT 

Gene : MAGE -3 

Segments : 1 
Offset : X 
1st Codon : 1 

AAMPLE QRSQHCKPBEGLEARGEALGL'VGA 
GCCGCTATGCCTCTGGAACAGAGAAGCCAACACTGTAAGCCTGAGGAAGGCCTCGAGGCTAGGGGA6AGGCTCTGGGACTGGTCGGCGCT 

Gene : MAGE-3 

Segments : 2 
Offset : 16 
1st Codon : 1 

EGLEARGEALGLVGAQAPATEBQEAASSS S 
GAGGGACTGGAAGCCAGAGGCGAAGCCCTCGGCCTCGTGGGAGCCCAAGCCCCTGCCACAGAGGAACAG6AAGCCGCTAGCTCCAGCTCC 

Gene : MAGE-3 

Segments : 3 
Offset : 31 
1st Codon : 1 

QAPATE EQEAASSSSTIjVEVTLGEVPAAE S 
CAGGCTCCCGCTACCGAAGAGCAAGAGGCTGCCTCCAGCTCCAGCACACTGGTCGAGGTCACCCTCGGCGAAGTGCCTGCCGCTGAGTCC 

Gene : MAGE-3 

Segments : 4 
Offset : 46 
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1st Codon : 1 

TLVEVTLGEVPAAESPDPPQSPQGASSLPT 
ACCCTCGTGGAAGTGACACTGGGAGAGGTCCCCGCTGCCGAI^GCCCTGACCCTCCCCIVJU^GCCCTCAGGGAGCCTCCAGC 

Gene : MAGE -3 

Segment# : 5 
Offset : 61 
1st Codon : 1 

PDPPQSP QGASSLPTTMNYPLWSQSYEDSS 
CCCGATCCCCCTCAGTCCCCCCAAGGCGCTAGCTCCCTGCCTACCACAATGAATTACCCTCTGTGGAGCCA^ 

Gene : MAGE -3 

Segment # : 6 
Offset : 76 
1st Codon : 1 

TMKTYPLWSQSYEDSSNQEEEGPSTFPDLE S 
ACCATGAACTATCCCCTCTGGTCCCAGTCCTACGAAGACTCCAGCAATCAGGAAGAGGAAGGCCCTAGCACATTCCCTGACCTCGAGTCC 

Gene : MAGE -3 

Segment!^ : 7 
Offset : 91 
1st Codon : 1 

NQEEEGPSTFPDIiESEFQAAIiSRKVAELVH 
AACCAAGAGGAAGAGGGACCCTCCACCTTTCCCGATCTGGAAAGCGAATTCCAAGCCGCTCTGTCCAGGAAAGTGGCTGAGCTCGTGCAT 

Gene : MAGE -3 

Segment # : 8 
Offset : 106 

1st Codon : 1 

EFQAAIiSRKVAEIiVHFLLIjKYRAREPVTKA 
GAGTTTCAGGCTGCCCTCAGCAGAAAGGTCGCCGAACTGGTCCACTTTCTGCTCCTGAAATACAGAGCCAGAGAGCCTGTGACAAAGGCT 

Gene : MAGE -3 

Segments : 9 
Offset : 121 
1st Codon : 1 

FLIjIiKYRAREPVTKAEMLGSVVGNWQY FFP 
TTCCTCCTGCTCAAGTATAGGGCTAGGGAACCCGTCACCAAAGCCGAAATGCTCGGCTCCGTGGTCGGCAATTGGCAATACTTTTTCCCT 

Gene : MAGE -3 

Segment # : 10 
Offset : 136 

1st Codon : 1 

EMLGSVVGNWQYFFPVIFSKASSSLQLVFG 
GAGATGCTGGGAAGCGTCGTGGGAAACTGGCAGTATTTCTTTCCCGTCATCTTTAGCAAAGCCTCCAGCTCCCTGCAACTGGTCTTCGGA 

Gene : MAGE -3 

Segments : 11 
Offset : 151 

1st Codon : 1 

VIFSKASSSLQLVPGIEIiMEVDPIGHLYIF 
GTGATTTTCTCCAAGGCTAGCTCCAGCCTCCAGCTCGTGTTTGGCATTGAGCTCATGGAAGTGGATCCCATTGGCCATCTGTATATCTTT 

Gene : MAGE -3 

Segments : 12 
Offset : 166 
1st Codon : 1 

lELMEVDPIGHLYIFATCLGLSYDGLLGDN 
ATCGAACTGATGGAGGTCGACCCTATCGGACACCTCTACATTTTCGCTACCTGTCTGGGACTGTCCTACGATGGCCTCCTGGGAGACAAT 

Gene : MAGE -3 

Segments : 13 
Offset : 181 

1st Codon : 1 

ATCLGLSYDGLLGDNQiMPKAGIiLI IVLAX 
GCCACATGCCTCGGCCTCAGCTATGACGGACTGCTCGGCGATAACCAAATCATGCCCAAAGCCGGACTGCTCATCATTGTGCTCGCCATT 

Gene : MAGE-3 

Segments : 14 
Offset : 196 

1st Codon : 1 

QIMPKAGIiLIXV.LAIIAreGDCAPEEKIWE 
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CAGATTATGCCTAAGGCTGGCCTCCTGATTATCGTCCTGGCTATCATTGCCAGAGAGG6AGACTGTGCCCCTGAGGAAAAGATTTGGGAA 

Gene : MAGE -3 

Segment# : 15 
Offset : 211 
1st Codon : 1 

IAREGDCAPEEKIWEEIjSVLEVPEGREDSI 
ATCGCTAGGGAAGGC6ATTGCGCTCCCGAAGAGAAAATCTGGGAGGAACTGTCCGTGCTCGAGGTCTTCGAAGGCAGAGAGGATAGCATT 

Gene : MAGE -3 

Segment# : 16 
Offset : 226 
1st Codon : 1 

ELSVLEVFEGREDSIIiG DPKKLLTQHFVQE 
GAGCTCAGCGTCCTGGAAGTGTTTGAGGGAAGGGAAGACTCCATCCTCGGCGATCCCAAAAAGCTCCTGACACAGCATTTCGTCCAGGAA 

Gene : MAGE -3 

Segment* : 17 1 
Offset : 241 

1st Codon : 1 

LGDPKKIiLTQHPVQENYLEYRQVPGSDPAC 
CTGGGAGACCCTAAGAAACTGCTCACCCAACACTTTGTGCAAGAGAATTACCTCGAGTATAGGCAAGTGCCTGGCTCCGACCCTGCCTGT 

Gene : MAGE -3 

Segment # : 18 
Offset : 256 

1st Codon : 1 

NYLEYRQVPGSDPACYEFLWGPRAIiVETSY 
AACTATCTGGAATACAGACAGGTCCCCGGAAGCGATCCCGCTTGCTATGAGTTTCTGTGGGGCCCTAGGGCTCTGGTCGAGACAAGCTAT 

Gene : MAGE -3 

Segment # : 19 
Offset : 271 
1st Codon : 1 

YEFIiWGPRALVETSYVKVLHHMVKISGGPH 
TACGAATTCCTCTGGGGACCCAGAGCCCTCGTGGAAACCTCCTACGTCAAGGTCCTGCATCACATGGTGAAAATCTCCGGC6GACCCCAT 

Gene : MAGE -3 

Segment* : 20 
Offset : 286 

1st Codon : 1 

VKVLHHMVKISGGPHI SYPPXiHEWVLREGE 
GTGAAAGTGCTCCACCATATGGTCAAGATTAGCGGAGGCCCTCACATTAGCTATCCCCCTCTGCATGAGTGGGTGCTCAGGGAAGGCGAA 

Gene : MAGE -3 

Segment* : 21 
Offset : 301 

1st Codon : 1 

ISYPPLHEWVLREGEEAA 
ATCTCCTACCCTCCCCTCCACGAATGGGTCCTGAGAGAGGGAGAGGAAGCCGCT 

Gene : PRAME 

Segment* : 1 
Offset : 1 
1st Codon : 1 

AAMERRRLWGSIQSRYISMSVWTSPRRLVE 
GCCGCTATGGAAAGGAGAAGGCTCTGGGGAAGCATTCAGTCCAGGTATATCTCCATGTCCGTGTGGACCTCCCCCAGAAGGCTCGTG6AA 

Gene : PRAME 

Segment* : 2 
Offset : 16 

1st Codon : 1 

YISMSVWTSPRRLVELAGQSLIiKDEAIiAIA 
TACATTAGCATGAGCGTCTGGACAAGCCCTAGGAGACTGGTC6AGCTCGCCGGACAGTCCCTGCTCAAGGATGAGGCTCTGGCTATCGCT 

Gene : PRAME 

Segment* : 3 
Offset : 31 
1st Codon : 1 

LAGQSLLKDEALAlAAIiEIiLPRELFPPIiFM 
CTGGCT6GCCAAA6CCTCCTGAAAGACGAA6CCCTCGCCATTGCCGCTCTGGAACTGCTCCCCAGAGAGCTCTTCCCTCCCCTCTTCATG 
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Gene : PRAME 

Segment # : 4 
Offset : 46 

1st Codon : 1 

AIiELIiPRELFPPLFMAAFDGRHSQTLKAMV 
GCCCTCGAGCTCCTGCCTAGGGAACTGTTTCCCCCTCTGTTTATGGCTGCCTTTGACGGAAGGCATAGCCAAACCCTCAAGGCTATGGTC 

Gene : PRAME 

Segment # : 5 
Offset : 61 

1st Codon : 1 

AAPDGRHSQTLKAMVQAWPFTCLPIiGVLMK 
GCCGCTTTCGATGGCAGACACTCCCAGACACTGAAAGCCATGGTGCAAGCCTGGCCCTTTACCTGTCTGCCTCTGGGAGTGCTCATGAAA 

Gene : PRAME 

Segment* : 6 
Offset : 76 

1st Codon : 1 

QAWPFTCriPLGVLMKGQHLHLETFKAVLDG 
CAGGCTTGGCCTTTCACATGCCTCCCCCTCGGCGTCCTGATGAAGGGACAGCATCTGCATCTGGAAACCTTTAAGGCTGTGCTCGACGGA 

Gene : PRAME 

Segment # : 7 
Offset : 91 

1st Codon : 1 

GQHIiHLETFKAV IiDGIiDVIiliAQEVRPRRWK 
GGCCAACACCTCCACCTCGAGACATTCAAAGCCGTCCTGGATGGCCTCGACGTCCTGCTCGCCCAAGAGGTCAGGCCTAGGAGATGGAAA 

Gene : PRAME 

Segment # ; 8 
Offset : 106 
1st Codon : 1 

LDVIiIiAQEVRPRRWKLQVLDIiRKNSHQD FW 
CTGGATGTGCTCCTGGCTCAGGAAGTGAGACCCAGAAGGTGQAAGCTCCAGGTCCTGGATCTGAGAAAGAATAGCCATCAGGATTTCTGG 

Gene : PRAME 

Segment # : 9 
Offset : 121 

1st Codon : 1 

LQVLDLRKNSH QDFWTVWSGNRASIiYSPPE 
CTGCAAGTGCTCGACCTCAGGAAAAACTCCCACCAAGACTTTTGGACAGTGTGGAGCGGAAACAGAGCCTCCCTGTATAGCTTTCCCGAA 

Gene : PRAME 

Segment # ; 10 
Offset : 136 

1st Codon : 1 

TVWSGNRASLYSFPEPEAAQPMTKKRKVDG 
ACCGTCTGGTCCGGCAATAGGGCTAGCCTCTACTCCTTCCCTGAGCCTGAGGCTGCCCAACCCATGACCAAAAAGAGAAAGGTCGACGGA 

Gene : PRAME 

Segment # : 11 
Offset : 151 
1st Codon : 1 

PEAAQPMTKKRKVDGLSTEAEQPFI PVEVL 
CCCGAAGCCGCTCAGCCTATGACAAAGAAAAGGAAAGTGGATGGCCTCAGCACAGAGGCTGAGCAACCCTTTATCCCTGTGGAAGTGCTC 

Gene ' : PRAME 
Segment# : 12 
Offset : 166 

1st Codon : 1 

LSTEAEQPF IPVEVIiVDLFIjKEGACDELFS 
CTGTCCACCGAAGCCGAACAGCCTTTCATTCCCGTCGAGGTCCTGGTCGACCTCTTCCTCAAGGAAGGCGCTTGCGATGAGCTCTTCTCC 

Gene : PRAME 

Segment # : 13 
Offset : 181 

1st Codon ; 1 

VDIiFLKEGACDELFSYIiIEKVKRKKNVLRIj 
GTGGATCTGTTTCTGAAAGAGGGAGCCTGTGACGAACT6TTTAGCTATCTGATTGAGAAAGTGAAAAGGAAAAAGAATGTGCTCAGGCTC 

Gene : PRAME 

Segment* : 14 
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Offset : 196 

1st Codon ; 1 

YLIEKVKRKKNVLRLCCKKLKIFAMPMQDI 
TACCTCATCGAAAAGGTCAAGAGAAAGAAAAACGTCCTGAGACTGTGTTGCAAAAAGCTCAAGATTTTCGCTAT^ 

Gene : PRAME 

Segment# : 15 
Offset : 211 

1st Codon : 1 

CCKKIiKIFAMPMQDIKMILKMVQLDSIEDri 
TGCTGTAAGAAACTGAAAATCTTTGCCATGCCCATGCAGGATATCAAAATGATTCTGAAAATGGTCCAGCTCGACTCCATCGAAGACCTC 

Gene : PRAME 

Segment# : 16 
Offset : 226 
1st Codon : 1 

KMILKMVQIiDSIEDLEVTCTWKLPTLAKFS 
AAGATGATCCTCaAGATGGTGCAACTGGATAGCATTGAGGATCTGGAAGTGACATGCACMGQAAACTGCCTACCCTCGCCMATTC^ 

Gene : PRAME 

Segments : 17 
Offset : 241 

1st Codon : 1 

EVTCTWKLPTLAKFSPYLGQMINIjRRLLLS 
GAGGTCACCTGTACCTGGAAGCTCCCCACACTGGCTAAGTTTAGCCCTTACCTCGGCCAAATGATTAACCTCAGGAGACTGCTCCTGTCC 

Gene : PRAME 

Segments : 18 
Offset : 256 
1st Codon : 1 

PYLGQMINLRRLLLSHIHASSYISPEKEEQ 
CCCTATCTGG6ACAGATGATCAATCTGA6AAGGCTCCTGCTCAGCCATATCCATGCCTCCAGCTATATCTCCCCCGAAAAGGAAGAGCAA 

Gene : PRAME 

Segments : 19 
Offset : 271 
1st Codon ; 1 

HIHASSYISPEKEEQYIAQFTSQFLSLQCL 
CACATTCACGCTAGCTCCTACATTAGCCCTGAGAAAGAGGAACAGTATATCGCTCAGTTTACCTCCCAGTTTCTGTCCCTGCAATGCCTC 

Gene : PRAME 

Segments : 2 0 
Offset : 286 
1st Codon ; 1 

YIAQFTSQFIiSLQCIiQALYVDSLFFLRGRXi 
TACATTGCCCAATTCACAAGCCAATTCCTCAGCCTCCAGTGTCTGCAAGCCCTCTACGTCGACTCCCTGTTTTTCCTCAGGGGAAGGCTC 

Gene : PRAME 

Segments : 21 
Offset : 301 

1st Codon : 1 

QALYVDSIiFFLRGRIiDQLLRHVMNPLETIiS 
CAGGCTCTGTATGTGGATAGCCTCTTCTTTCTGA6AGGCAGACTGGATCAGCTCCTGAGACACGTCATGAATCCCCTCGAGACACTGTCC 

Gene : PRAME 

Segments : 22 
Offset : 316 
1st Codon : 1 

DQLLRHVMNPIiETLS iTNCRLSEGDVMHIiS 
GACCAACTGCTCAGGCATGTGATGAACCCTCTGGAAACCCTCAGCATTACCAATTGCAGACTGTCCGAGGGAGACGTCATGCATCTGTCC 

Gene : PRAME 

Segments : 23 
Offset : 331 

1st Codon : 1 

ITNCRIiSEGDVMHLSQSPSVSQLSVLSLSG 
ATCACAAACTGTAGGCTCAGCGAAGGCGATGTGATGCACCTCAGCCAAAGCCCTAGCGTCAGCCAACTGTCCGTGCTCAGCCTCAGCGGA 



Gene 

Segments 
Offset 
1st Codon 



PRAME 
24 
346 
1 
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QSPSVSQIiSVLSIiSGVMLTDVSPEPLQALL 
CAGTCCCCCTCCGTGTCCCAGCTCAGCGTCCTGTCCCTGTCCGGCGTCATGCTCACCGATGTGTCCCCCGAACCCCTCCAGGCTCTGCTC 

■ r 

Gene : FRAME 

Segment# : 25 
Offset : 361 
1st Codon : 1 

VMLTDVSPEPLQAIiLERASATLQDLVFDEC 
GTGATGCTGACAGACGTCAGCCCTGAGCCTCTGCAAGCCCTCCTGGAAAGGGCTAGCGCTACCCTCCAGGATCTGGTCTTCGATGAGTGT 

Gene : PRAME 

Segment# ; 2^ 
Offset ; 376 

1st Codon : 1 

ERASATLQDIiVFDECGXTDDQIiIiAIiliPSLS 
GAGAGA6CCTCCGCCACACT6CAAGACCTCGTGTTTGAC6AATGCGGAATCACAGACGATCAGCTCCTGGCTCTGCTCCCCTCCCTGTCC 

Gene : PHAME 

Segment# : 27 
Offset : 391 
1st Codon : 1 

GXTDDQLLALLPSLSHCSQIiTTLSPYGNSI 
GGCATTACCGATGACCAACTGCTCGCCCTCCTGCCTAGCCTCAGCCATTGCTCCCAGCTCACCACACTGTCCTTCTATG6CAATAGCATT 

Gene : PRAME 

Segments : 28 
Offset : 406 
1st Codon : 1 

HCSQIiTTIiSFYGNSISISALQSLIiQHIilGIi 
CACTGTAGCCAACTGACAACCCTCAGCTTTTACGGAAACTCCATCTCCATCTCCGCCCTCCAGTCCCTGCTCCAGCATCTGATTGGCCTC 

Gene : PRAME 

Segment# : 29 
Offset : 421 
1st Codon : 1 

SISAIiQSLLQHLIGLSNLTHVLypVPLESY 
AGCATTA6CGCTCTGCAAAGCCTCCTGCAACACCTCATCGGACTGTCCAACCTCACCCATGTGCTCTACCCTGT6CCTCTGGAAAGCTAT 

Gene : PRAME 

Segnient# : 30 
Offset : 436 

1st Codon : 1 

SNLTHVLYPVPLESYEDIHGTIiHLERLAYL 
AGCAATCTGACACACGTCCTGTATCCC6TCCCCCTCGAGTCCTACGAAGACATTCACGGAACCCTCCACCTCGAGAGACTGGCTTACCTC 

Gene : PRAME 

Segments : 31 
Offset : 451 
1st Codon : 1 

EDIHGTLHLERLAYLHARLREIiLCELGRPS 
GAGGATATCCATGGCACACTGCATCTGGAAAGGCTCGCCTATCTGCATGCCAGACTGAGAGAGCTCCTGTGTGAGCTCGGCAGACCCTCC 

Gene : PRAME 

Segments : 32 
Offset : 466 

1st Codon : 1 

HARLREIiLCBLGRPSMVWLSANPCPHCGDR 
CACGCTAGGCTCAGGGAACTGCTCTGCGAACTGG6AAGGCCTAGCATGGTGTGGCTGTCCGCCAATCCCTGTCCCCATT6CGGA6ACAGA 

Gene : PRAME 

Segments : 33 
Offset : 481 
1st Codon ; 1 

MVWLSANPCPHCGDRTFYDPEPILC PCFMP 
ATGGTCTGGCTCAGCGCTAACCCTTGCCCTCACTGTGGCGATAGGACATTCTATGACCCTGAGCCTATCCTCTGCCCTTGCTTTATGCCT 

Gene : PRAME 

Segments : 34 
Offset : 496 
1st Codon : 1 

TFYDPEPILCPCFMPNAA 
ACCTTTTACGATCCC6AACCCATTCTGTGTCCCTGTTTCATGCCCAATGCCGCT 
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Gene : TRP2IN2 

Segment # : 1 
Offset : 1 
1st Codon : 1 

AALMETHLSSKRYTEEAGGFFPWLKVYYYR 
GCCGCTCTGATGGAGACACACCTCAGCTCCAAGAGATACACAGAGGAAGCCGGAGGCTTTTTCCCTTGGCTCAAGGTCTACTATTACAGA 

Gene : TRP2IN2 

Segment # ; 2 
Offset : 16 
1st Codon : 1 

EAGGFFPWLKVYYYRFVlGIiRVWQWEVISC 
GAGGCTGGCGGATTCTTTCCCTGGCTGAAAGTGTATTACTATAGGTTTGTGATTGGCCTCAGGGTCTGGCAATGGGAAGTGATTAGCTGT 

Gene : TRP2IN2 

Segment # : 3 
Offset : 31 
1st Codon : 1 

FVlGIiRVWQWEVISCKLIKRATTRQPAA 
TTCGTCATCGGACTGAGAGTGTGGCAGTGGGAGGTCATCTCCTGCAAACTGATTAAGAGAGCCACAACCAGACAGCCTGCCGCT 

Gene : NYNSOla 

Segment# ,: 1 
Offset : 1 

1st Codon : 1 

AAMQAEGRGTGGSTGDADGPGGPGIPDGPG 
GCCGCTATGCAAGCCGAAGGCAGAGGCACAGGCGGAAGCACAGGCGATGCCGATGGCCCTGGCGGACCCGGAATCCCTGACGGACCCGGA 

Gene : NYNSOla 

Segment# : 2 
Offset : 16 
1st Codon : 1 

DADGPGGPGI PDGPGGNAG6PGEAGATGGR 
GACGCTGACGGACCCGGAGGCCCTGGCATTCCCGATGGCCCTGGCGGAAACGCTGGCGGACCCGGAGAGGCTGGCGCTACCGGAGGCAGA 

Gene : NYNSOla 

Segments : 3 
Offset : 31 

1st Codon : 1 

GNAGGPGEAGATGGRGPRGAGAARASGPGG 
GGCAATGCCGGAGGCCCTG6CGAAGCCGGAGCCACAGGCGGAAGGGGACCCAGAGGCGCTGGCGCTGCCAGAGCCTCCGGCCCTGGCGGA 

Gene : NYNSOla 

Segment # : 4 
Offset : 46 

1st Codon : 1 

GPRGAGAARASGPGGGAPRGPHGGAASGLN 
GGCCCTAGGGGAGCCGGAGCCGCTAGGGCTAGCGGACCCGGAGGCGGAGCCCCTAGGGGACCCCATGGCGGAGCCGCTAGCGGACTGAAT 

Gene : NYNSOla 

Segment # : 5 
Offset : 61 
1st Codon ; 1 

GAPRGPHGGAASGLNGCCRCGARGPESRLIi 
GGCGCTCCCAGAGGCCCTCACGGA6GCGCTGCCTCCGGCCTCAACGGATGCTGTAGGTGTGGCGCTAGGGGACCCGAAAGCAGACTGCTC 

Gene : NYNSOla 

Segment # ; 6 
Offset : 76 

1st Codon : 1 

GCCRCGARGPESRLLEFYDAMPFATPMEAE 
GGCTGTTGCAGATGCGGAGCCAGAGGCCCTGAGTCCAGGCTCCTGGAATTCTATCTGGCTATGCCTTTCGCTACCCCTATGGAAGCCGAA 

Gene : NYNSOla 

Segment# : 7 
Offset : 91 
1st Codon : 1 

EFYIiAMPFATPMEAELARRSLAQDAPPIiPV 
GAGTTTTACCTCGCCATGCCCTTTGCCACACCCATGGAGGCTGAGCTCGCCAGAAGGTCCCTGGCTCAGGATGCCCCTCCCCTCCCCGTC 

Gene ; NYNSOla 
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Segment# : 8 
Offset : 106 
1st Codon : 1 

LARRSLAQDAPPLPVPGVLIiKEPTVSGNIL 
CTGGCTAGGAGAAGCCTCGCCCAAGACGCTCCCCCTCTGCCTGTGCCTGGCGTCCTGCTCAAGGAATTCACAGTGTCCGGCAATATCCTC 

Gene : NYNSOla 

Segment # : 9 
Offset : 121 
1st Codon ; 1 

PGVLIiKEFTVSGNIIiTIRLTAADHRQLQLS 
CCCGGAGTGCTCCTGAAAGAGTTTACCGTCAGCGGAAACATTCTGACAATCAGACTGACAGCCGCTGACCATAGGCAACTGCAACTGTCC 

Gene : NYNSOla 

Segment # : 10 
Offset : 136 
1st Codon : 1 

TIRIiTAADHRQLQLSISSCLQQLSLLMWIT 
ACCATTAGGCTO^CCGCTGCCGATmCAGACaLGCTCCAGCTCAGCATTAGCTCCTGCCTCa^GCaACTGTCCCTGCTCATGTGG 

Gene : NYNSOla 

Segment# : 11 
Offset : 151 
1st Codon : 1 

ISSCLiQQLSLLMWITQCFLPVFLAQPPSGQ 
ATCTCCAGCTGTCTGCAACAGCTCAGCCTCCTGATGTGGATTACCCAATGCTTTCTGCCTGTGTTTCTGGCTCAGCCTCCCTCCGGCCAA 

Gene : NYNSOla 

Segment # : 12 
Offset : 166 

1st Codon : 1 

QCFIiPVFIiAQPPSGQRRAA 
CAGTGTTTCCTCCCCGTCTTCCTCGCCCAACCCCCTAGCGGACAGAGAAGGGCTGCC 

Gene : NYNSOlb 

Segment# : 1 
Offset : 1 
1st Codon : 1 

AAMLMAQEALAFLMAQGAMLAAQERRVPRA 
GCCGCTATGCTCATGGCTCAGGAAGCCCTCGCCTTTCTGATGGCCCAAGGCGCTATGCTCGCC6CTCAGGAAAGGAGAGTGCCTAGGGCT 

Gene : NYNSOlb 

Segment # : 2 
Offset : 16 

1st Codon : 1 

QGAMIiAAQERRVPRAAEVPGAQGQQGPRGR 
CAGGGAGCCATGCTGGCTGCCCAAGAGAGAAGGGTCCCCAGAGCCGCTGAGGTCCCCGGAGCCCAAGGCCAACAGGGACCCAGAGGCAGA 

Gene : NYNSOlb 

Segment # : 3 
Offset : 31 
1st Codon : 1 

AEVPGAQGQQGPRGREEAPRGVRMAARIiQG 
GCCGAAGTGCCTGGCGCTCAGGGACAGCAAGGCCCTAGGGGAAGGGAAGAGGCTCCCAGAGGCGTCAGGATGGCCGCTAGGCTCCAGGGA 

Gene : NYNSOlb 

Segment # : 4 
Offset : 46 

1st Codon : 1 

EEAPRGVRMAARLQGAA 
GAGGAAGCCCCTAGG66AGTGAGAATGGCTGCCAGACTGCAAGGCGCTGCC 

Gene : LAGEl 

Segment # : 1 
Offset : 1 

1st Codon : 1 

AAMQAEGQGTGGSTGDADGPGGPGIPDGPG 
GCCGCTATGCAAGCCGAAGGCCAAGGCACAGGCGGAAGCACAGGCGATGCCGATGGCCCTGGCGGACCCGGAATCCCTGACGGACCCGGA 

Gen.e : IiAGEl 

Segment* : 2 
Offset : 16 




wo 01/90197 



PCT/AUOl/00622 



194/216 

1st Codon : 1 

DADGPGGPGIPDGPGGNAGGPGEAGATGGR 
GACGCTGACG6ACCCGGAGGCCCTGGCATTCCCGATGGCCCTGGCGGAAACGCTGGCGGACCCGGAGAGGCTGGCGCTACCGGAGGCAGA 

Gene : LAGEl 

Segment# : 3 
Offset : 31 

1st Codon : 1 

GNAGGPGEAGATGGRGPRGAGAARASGPRG 
GGCAATGCCGGA6GCCCTGGCGAAGCCGGAGCCACAGGCGGAAGGGGACCCAGAGGCGCTGGCGCTGCCAGAGCCTCCGGCCCTAGGGGA 

Gene : LAGEl 

Segment # : 4 
Offset : 46 
1st Codon : 1 

GPRGAGAARASGPRGGAPRGPHGGAASAQD 
GGCCCTAGGGGAGCCGGAGCCGCTAGGGCTAGCGGACCCAGAGGCGGAGCCCCTAGGGGACCCCATGGCGGAGCCGCTAGCGCTCAGGAT 

Gene : LAGEl 

Segraent# : 5 
Offset : 61 
1st Codon : 1 

GAPRGPHGGAASAQDGRCPCGARRPDSRLL 
GGCGCTCCCAGAGGCCCTCACGGAGGCGCTGCCTCCGCCCAAGACGGAAGGTGTCCCTGTGGCGCTAG6AGACCCGATAGCAGACTGCTC 

Gene : LAGEl 

Segment # : 6 
Offset : 76 

1st Codon : 1 

GRCPCGARRPDSRLLQLHITMPFSSPMEAE 
GGCAGATGCCCTTGCGGAGCCAGAAGGCCTGACTCCAGGCTCCTGCAACTGCATATCACAATGCCTTTCTCCAGCCCTATGGAAGCCGAA 

Gene : LAGEl 

Segraent# : 7 
Offset : 91 
1st Codon : 1 

QLHITM PPSSPMEAELVRRILSRDAAPLPR 
CAGCTCCACATTACCATGCCCTTTAGCTCCCCCATGG2\GGCTGAGCTCGTGAGAAGGATTCTGTCCAGG.GATGCCGCTCCCCTCCCCAGA 

Gene : LAGEl 

Segment # : 8 
Offset : 106 

1st Codon : 1 

LVRRIL SRDAAPLPRPGAVIiKDFTVSGNLL 
CTGGTCAGGAGAATCCTCAGCAGAGACGCTGCCCCTCTGCCTAGGCCTGGCGCTGTGCTCAAGGATTTCACAGTGTCCGGCAATCTGCTC 

Gene : LAGEl 

Segment # : 9 
Offset : 121 
1st Codon : 1 

PGAVLKDPTVSGNLLFIRLTAADHRQLQLS 
CCCGGAGCCGTCCTGAAAGACTTTACCGTCAGCGGAAACCTCCTGTTTATCAGACTGACAGCCGCTGACCATAGGCAACTGCAACTGTCC 

Gene : LAGEl 

Segment # : 10 
Offset : 136 
1st Codon : 1 

FIRLTAADHRQLQLSISSCLQQLSLLMWIT 
TTCATTAGGCTCACCGCTGCCGATCACAGACAGCTCCAGCTCAGCATTAGCTCCTGCCTCCAGCAACTGTCCCTGCTCATGTGGATCACA 

Gene : LAGEl 

Segments : 11 
Offset : 151 
1st Codon : 1 

ISSCLQQLSLLMWITQCFLPVFLAQAPSGQ 
ATCTCCAGCTGTCTGCAACAGCTCAGCCTCCTGATGTGGATTACCCAATGCTTTCTGCCTGTGTTTCTGGCTCAGGCTCCCTCCGGCCAA 

Gene : LAGEl 

segments : 12 
Offset : 166 
1st Codon : 1 

QCFLPVFLAQAPSGQRRAA 
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CAGTGTTTCCTCCCCGTCTTCCTCGCCCAAGCCCCTAGCGGACAGAGAAGGGCTGCC 
Segments in scrambled order: 



MAGE-1 #15 

APEEBIWEELSVMEVYDGREHSAYGEPRKIi 
GCCCCTGAGGAAGAGATTTGGGAAGAGCTCAGCGTCATGGAAGTGTATGACGGAAGGGAACACTCCGCCTATGGCGAACCCAGAAAGCTC 

MAGE-1 #4 

EEVPTAGSTDPPQSPQGASAFPTTINFTRQ 
GAGGAAGTGCCTACCGCTGGCTCCACCGATCCCCCTCAGTCCCCCCAAGGCGCTAGCGCTTTCCCTACCACAATCAATTTCACAAGGCAA 

PRAME #10 

TVWSGNRASLYSFPEPEAAQPMTKKRKVDG 
ACCGTCTGGTCCGGCAATAGGGCTAGCCTCTACTCCTTCCCTGAGCCTGAGGCTGCCCAACCCATGACCAAAAAGAGAAAGGTCGACGGA 

MAGE-3 #14 

QIMPKAGLLXIVLAIIAREGDCAPEEKIWE 
CA6ATTATGCCTAAGGCTGGCCTCCTGATTATCGTCCTGGCTATCATTGCCAGAGAGGGAGACTGTGCCCCTGAGGAAAAGATTTGGGAA 

PRAME #9 

LQVLDIiRKNSHQDFWTVWSGNRASIiYSPPE 
CTGCSAGTGCTCGACCTCAGGAAAAACTCCCACCAAGACTTTTGGACAGTGTGGAGCGGAAACAGAGCCTCCCTO 

PRAME #8 

LDVLIiAQEVRPRRWKLQVLDIjRKNSHQDFW 
CTGGATGTGCTCCTGGCTCAGGAAGTGAGACCCAGAAGGTGGAAGCTCCAGGTCCTGGATCTGAGAAAGAATAGCCATCAG6ATTTCTGG 

NYNSOlb #2 

QGAMLAAQERRVPRAAEVPGAQGQQGPRGR 
CAGGGAGCCAT6CTGGCTGCCCAAGAGAGAAGGGTCCCCAGAGCCGCTGAGGTCCCCGGA6CCCAAGGCCAACAGGGACCCAGAGGCAGA 

PRAME #24 

QSPSVSQLSVIiSLSGVMriTDVSPEPIiQ AriL 
CAGTCCCCCTCCGT6TCCCAGCTCAGCGTCCTGTCCCTGTCCGGCGTCATGCTCACCGATGTGTCCCCCGAACCCCTCCAGGCTCTGCTC 

MAGE-1 #17 

LTQDIiVQEKYLEYRQVPDSD PARYEFLWGP 
CTGACACAGGATCTGGTCCAGGAAAAGTATCTGGAATACAGACAGGTCCCCGATAGCGATCCCGCTAGGTATGAGTTTCTGT6GGGCCCT 

MAGE-1 #6 

RQPSEGS SSREEEGPSTSCIIiESIiFRAVIT 
AGGCAACCCTCCGAGGGAAGCTCCAGCAGAGAGGAAGAGGGACCCTCCACCTCCTGCATTCTGGAAAGCCTCTTCAGA6CCGTCATCACA 

BAGE #1 

AAMAARAVFIiALSAQLLQARLMKEESPVVS 
6CCGCTATGGCTGCCAGAGCCGTCTTCCTCGCCCTCAGCGCTCAGCTCCTGCAAGCCAGACTGATGAAGGAAGAGTCCCCCGTCGTGTCC 

PRAME #34 

TFYDPEPILCPCFMPNAA 
ACCTTTTACGATCCCGAACCCATTCTGTGTCCCTGTTTCATGCCCAATGCCGCT 

MAGE-3 #12 

lEIiMEVD PIGHLYIFATCLGLSYDGLLGDN 
ATCGAACTGATGGAGGTCGACCCTATCGGACACCTCTACATTTTCGCTACCTGTCTGG6ACTGTCCTACGATGGCCTCCTGGGAGACAAT 

GAGE-1 #2 

RRYVE P P EMIGPMRPEQFSDEVEPATPEEG 
AGGAGATACGTCGAGCCTCCCGAAATGATTGGCCCTATGAGACCCGAACAGTTTAGCGATGAGGTCGAGCCTGCCACACCCGAAGAGGGA 

TRP2IN2 #2 

EAGGFFPWIiKVYYYRFVlGLRVWQWEVI SC 
GAGGCTGGCGGATTCTTTCCCTGGCTGAAAGTGTATTACTATAGGTTTGTGATTGGCCTCAGGGTCTGGCAATGGGAAGTGATTAGCT 

PRAME #1 

AAMERRRLW GSIQSRYISMSVWTSPRRLVE 
GCCGCTATGGAAAGGAGAAGGCTCTGGdGAAGCATTCAGTCCAGGTATATCTCCATGTCCGT6TGGACCTCCCCCai.GAAG6CTCGT^ 

TRP2IN2 #1 

AALMETHLSSKRYTEEAGGFFPWLKVYYYR 
GCCGCTCTGATGGAGACACACCTCAGCTCCAAGAGATACACAGAGGAAGCCGGAGGCTTTTTCCCTTGGCTCaAGGTCTACTATTACAGA 
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MAGE-l #1 

AAMS LEQRSIiHCKPEEALEAQQEALGLVCV 
GCCGCTATGTCCCTGGAACAGAGAAGCCTCCACTGTAAGCCTGAGGAAGCCCTCGAGGCTCAGCAA6AGGCTCTGGGACTG6TCTGCGTC 

MAGE-l #3 

QAATS S SSPLVLGTLEEVPTAGSTDPPQS P 
CAGGCTGCCACAAGCTCCAGCTCCCCCCTCGTGCTCGGCACACTGGAAGAGGTCGCCACAGCCGGAAGCACAGACCCTCCCCAAAGCCCT 

PRAME #4 

ALEriLPRELPPPLFMAAFDGRHSQTLKAMV 
GCCCTCGAGCTCCTGCCTAGGGAACTGTTTCCCCCTCTGTTTATGGCTGCCTTTGACGGAAGGCATAGCCAAACCCTCAAGGCTATGGTC 

MAGE-3 #16 

EIiSVIiEVPEGREDSIIiGDPKKLLTQHFVQE 
GAGCTCAGCGTCCTGGAAGTGTTTGAGGGAAGGGAAGACTCCATCCTCGGCGATCCCAAAAA6CTCCTGACACAGCATTTCGTCCAGGAA 

MAGE-l #11 

ESLQIiVFGIDVKEADPTGHSYVIiVTCLGIiS 
GAGTCCCTGCAACTGGTCTTCGGAATCGATGTGAAAGAGGCTGACCCTACCGGACACTCCTACGTCCTGGTCACCTGTCTGGGACTGTCC 

MAGE-3 #5 

PDPPQS PQGASSIiPTTMKTYPLWSQSYEDS S 
CCCGATCCCCCTCAGTCCCCCCAAGGCGCTAGCTCCCTGCCTACCACAATGAATTACCCTCTGTGGAGCCAAAGCTATGAGGATAGCTCC 

LAGEl #1 

AAMQAEGQGTGGSTGDADGPGGPGIPDGPG 
GCCGCTATGCAAGCCGAAGGCCAAG6CACAGGCGGAAGCACAGGCGATGCCGATGGCCCTGGCGGACCCGGAATCCCTGACGGACCCGGA 

NYNSOla #12 

QCFLPVFLAQPPSGQRRAA 
CAGTGTTTCCTCCCCGTCTTCCTCGCCCAACCCCCTAGCGGACAGAGAAGGGCTGCC 

gpl00Iii4 #2 

TWGEGLPSQPIIHTCVYFFLPDHLSFGRPF 
ACCTGGGGCGAAGGCCTCCCCTCCCAGCCTATCATTOVCACATGCGTCTACTTTTTCCTCCCCGATaVCCTa^GCTTTGGCa^G^ 

MAGE-l #7 

STSCI LE SLFRAVITKKVADLVGFLLLKYR 
AGCACAAGCTGTATCCTCGAGTCCCTGTTTAGG6CTGTGATTACCAAAAAGGTCGCCGATCTGGTCGGCTTTCT6CTCCTGAAATACAGA 

NYNSOla #1 

AAMQAEGRGTGGSTGDADGPGGPGIPDGPG 
GCCGCTATGCAAGCCGAAGGCAGAGGCACAGGCGGAAGCACAGGCGATGCCGATGGCCCTGGCGGACCCGGAATCCCTGACGGACCCGGA 

GAGE-1 #7 

DGPDGQEMDPPNPEEVKTPEEEMRSHYVAQ 
6ACGGACCCGATGGCCAAGAGATGGACCCTCCCAATCGCGAAGAGGTCAAGACACCCGAAGAGGAAATGAGAAGCCATTACGTCGCCCAA 

NYNSOla #11 

ISSCLQQLSIiLMWITQCFIiPVFLAQPPSGQ 
ATCTCCaGCTGTCTGCAACAGCTCAGCCTCCTGATGTGGATTACCCaUiTGCTTTCTGCCTGTGTTTCTGGCTCAGCCTCCCTCCGGCC^ 

PRAME #26 

ERASATLQDLVFDECGITDDQLIiAIiLPSIiS 
GAGAGAGCCTCCGCCACACTGCAAGACCTCGTGTTTGACGAATGCGGAATCACAGACGATCAGCTCCTGGCTCTGCTCCCCTCCCTGTCC 

MAGE-3 #17 

IiGDPKKLLTQHFVQENYIiEYRQVPGSDPAC 
CTGGGAGACCCTAAGAAACTGCTCaCCCSiACACTTTGTGCAAGAGAATTACCTCGAGTATAGGCAAGTGCCTGGCTCCGACCCTGCCTGT 

MAGE-l #2 

EALEAQ QEAIiGLVCVQAATSSSSPLVLGTHi 
GAG6CTCTGGAAGCCCAACAGGAAGCCCTCGGCCTCGTGT6TGTGCAAGCCGCTACCTCCAGCTCCAGCCCTCTGGTCCTGGGAACCCTC 

NYNSOla #7 

EFYLAMPFATPMEAELARRSIiAQDAPPIiPV 
GAGTTTTACCTCGCCATGCCCTTTGCCACACCCATGGAGGCTGAGCTCGCCAGAAGGTCCCTGGCTCAGGATGCCCCTCCCCTCCCC^ 

NYNSOlb #4 

EEAPRGVRMAARLQGAA 
GAGGAAGCCCCTAGGGGAGTGAGAATGGCTGCCAGACTGCAAGGCGCTGCC 
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BAGE #3 

WRLEPEDGTALCPIPAA 
TGGAGACTGGAACCCGAAGACGGAACCGCTCTGTGTTTCATTTTCGCTGCC 

GAGE-1 #3 

EQFSDEVEPATPEEGEPATQRQDPAAAQEG 
GAGCAATTCTCCGACGAAGTGGAACCCGCTACCCCTGAGGAAGGCGAACCCGCTACCCAAAGGCAAGACCCTGCCGCTGCCCAAGAGGGA 

MAGE-3 #6 

TMNYPLWSQSYEDSSNQEEEGPSTFPDLE S 
ACCATGAACTATCCCCTCTGGTCCCAGTCCTACGAAGACTCCAGCAATCAGGAAGAGGAAGGCCCTAGCACATTCCCTGACCTCGAGTCC 

MAGE- 3 #7 

NQEEBGPSTFPDLESEFQAALSRKVAEIiVH 
AACCAAGAGGAAGAGGGACCCTCCACCTTTCCCGATCTGGAAAGCGAATTCCAAGCCGCTCTGTCCAGGAAAGTGGCTGAGCTCGTGCAT 

PRAME #13 

VDI.FIiKEGACDEIiFSYI.IEKVKRKKNVLRIi 
GTGGATCTGTTTCTGAAAGAGGGAGCCTGTGACGAACTGTTTAGCTATCTGATTGA6AAAGTGAAAAGGAAAAAGAATGTGCTCAGGCTC 

NYNSOla #10 

TIRLTAADHRQLQLSI S SCLQQLSLLMW X T 
ACCATTAGGCTCACCGCTGCCGATCACAGACAGCTCCaGCTCAGCATTAGCTCCTGCCTCCAGCAACTGTCCCTGCTCATGTGGATC^ 

MAGE -3 #1 

AAMPIiEQRSQHCKPEEGIiEARGEALGLVGA 
GCCGCTATGCCTCTGGAACAGAGAAGCCAACACTGTAAGCCTGAGGAAGGCCTCGAGGCTAGGGGAGAGGCTCTGGGACTGGTCGGCGCT 

NYNSOla #2 

DADGPGGPGIPDGPGGNAGGPGEAGATGGR 
GACGCTGACGGACCCGGAGGCCCTGGCATTCCCGATGGCCCTGGCGGAAACGCTGGCGGACCCGGAGAGGCTGGCGCTACCGGAGGCAGA 

MAGE -3 #19 

YEFIiWGPRAIiVETSYVKVLHHMVKI SGG PH 
TACGAATTCCTCTGGGGACCCAGAGCCCTCGTGGAAACCTCCTACGTC^^GGTCCTGCATCACATGGTGAAAATCTCCGGCGGACCCCAT 

PRAME #23 

ITNCRLSEGDVMHLSQSPSVSQLSVLSLSG 
ATCACAAACTGTAGGCTCAGCGAAGGCGATGTGATGCACCTCAGCCAAAGCCCTAGCGTCAGCCaACTGTCCGTGCTCTl^ 

MAGE-3 #18 

NYLEYRQVPGSDPACYEFLWGPRALVETSY 
AACTATCTGGAATACAGACAGGTCCCCGGAAGCGATCCCGCTTGCTATGAGTTTCTGTGGGGCCCTAGGGCTCTGGTCGAGACAAGCTAT 

MAGE-3 #11 

VIFSKASSSLQLVFGIELMEVDPIGHLYI F 
GTGATTTTCTCCAAGGCTAGCTCCAGCCTCCAGCTCGTGTTTGGCATTGAGCTCATGGAAGTGGATCCCATTGGCCATCTGTATATCTTT 

PRAME #21 

QAIiYVDSLFFLRGRLDQLLRHVMNPLETLS 
CAGGCTCTGTATGTGGATAGCCTCTTCTTTCTGAGAGGCAGACTGGATCAGCTCCTGAGACACGTCATGAATCCCCTCGAGACACTGTCC 

PRAME #20 

YIAQFTSQFLSLQCLQALYVDSLPFBRGRX, 
TACATTGCCCAATTCACAAGCCAATTCCTCaiGCCTCCAGTGTCTGCAAGCCCTCTACGTCGACTCCCTGTTTTTCCTCAGGG(^^ 

PRAME #7 

GQHLHLETFKAVIiDGLDVLLAQE VRPRRWK 
GGCCAACACCTCCACCTCGAGACATTCAAAGCCGTCCTGGATGGCCTCGACGTCCTGCTCGCCCAAGAGGTCAGGCCTAGGAGATGGAAA 

LAGEl #10 

FIRIiTAADHRQLQLSISSCLQQLSLLMWlT 
TTCS^TTAGGCTCACCGCTGCCGATCACAGACAGCTCCAGCTCAGCATTAGCrCCTGCCT 

PRAME #15 

CCKKLKIFAMPMQDIKMILKMVQLDSIEDIj 
TGCTGTAAGAAACTGAAAATCTTTGCCATGCCCATGCAGGATATCAAAATGATTCTGAAAATGGTCCAGCTCGACTCCyiTCGM 

NYNSOla #5 

GAPRGPHGGAASGLNGCCRCGARGPESRIili 
GGCGCTCCCAGAGGCCCTCACGGAGGCGCTGCCTCCGGCCTCAACGGATGCTGTAGGTGTGGCGCTAGGGGACCCGAAAGCAGACTGCTC 
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MAGB-l #a 

KKVADLVGFLLLKYRAREPVTKAEMIiE SVI 
AAGAAAGTGGCTGACCTCGTGGGATTCCTCCTGCTC?VAGTATAGGGCTAGGGAACCCGTCACCA2^GCCGA!^ATGCTCGAGTCCG^^ 

MAGE-1 #13 

YDGIjLGDNQIMPKTGFLIIVLVMIAMEGGH 
TACGATGGCCTCCTGGGAGACAATCAGATTATGCCTAAGACAGGCTTTCTGATTATCGTCCTGGTCATGATTGCCATGGAGGGAGGCCAT 

FRAME #29 

SI SALQSLLQHLIGIiSNLTHVLYPVPrjESy 
AGCATTAGCGCTCTGCAAAGCCTCCTGCAACACCTCATCGGACTGTCCAACCTCACCCATGTGCTCTACCCTGTGCCTCTGGAAAGCTAT 

MAGE-3 #15 

lAREGDCAPEEKIWEELSVLEVFEGREDSI 
ATCGCTAGGGAAGGCGATTGC6CTCCCGAAGAGAAAATCTGGGAGGAACTGTCCGTGCTCGAGGTCTTCGAAGGCAGAGAGGATAGCATT 

PRAME #22 

DQLLRHVMNPLETLSITNCRLSEGDVMHIiS 
GACCT^CTGCTCAGGCATGTGATGAACCCTCTGGAAACCCTCAGCATTACCAATTGCAGACTGTCCGAGGGAGACGTCATGCATC 

MAGE-1 #19 

RALAETSYVKVLEYVIKVSARVRFFFPSLR 
AGGGCTCTGGCTGAGACAAGCTATGTGAAAGTGCTCGAGTAT6TGATTAAGGTCAGCGCTAGGGTCAGGTTTTTCTTTCCCTCCCTGAGA 

PRAME #3 0 

SNLTHVLYPVPLESYEDIHGTLHIiERIjAYIi 
AGa^TCTGACa.a^CGTCCTGTATCCCGTCCCCCTCGAGTCCTACGAAGA(:»tTTCaCGGAACCCTCCACCTCGAGAGACTC 

NYNSOlb #1 

AAMLMAQEAIiAFIiMAQGAMLAAQERRVPRA 
GCCGCTATGCTCATGGCTCAGGAAGCCCTCGCCTTTCTGATGGCCCAAGGCGCTATGCTCGCCGCTCAGGAAAGGAGAGTGCCTAGGGCT 

MAGE-1 #10 

KNYKHCFPEIFGKASESLQLVFGIDVKEAD 
AAGAATTACaAACACTGTTTCCCTGAGATTTTCGGAAAGGCTAGCGAAAGCCTCCAGCTCGTGTTTGGCATTGACGTCAAGGAAGCCGAT 

MAGE-3 #4 

TLVEVTLGEVPAAESPDPPQSPQGASSLPT 
ACCCTCGTGGAAGTGACACTGGGAGAGGTCCCC6CTGCCGAAAGCCCTGACCCTCCCCAAAGCCCTCAGGGAGCCTCCAGCCTCCCCACA 

PRAME #32 

HARLRELLCELGRP SMVWLSANPCPHCGDR 
CACGCTAGGCTCAGGGAACTGCTCTGCGAACT6GGAAGGCCTAGCATGGTGTGGCTGTCCGCCAATCCCTGTCCCCATT6CGGAGAGAGA 

PRAME #25 

VMLTDVSPEPLQALLERASATLQDriVFDEC 
GTGATGCTGACAGACGTCAGCCCTGAGCCTCTGCAAGCCCTCCTGGAAAGGGCTAGCGCTACCCTCCAGGATCTGGTCTTCGATGAGTGT 

GAGE-1 #5 

EDEGASAGQGPKPEADSQEQGHPQTGCECE 
GAGGATGAGGGAGCCTCCGCCGGACAGGGACCCAAACCCGAAGCCGATAGCCAAGAGCAAGGCCATCCCCAAACCGGATGCGAATGCGAA 

MAGE-3 #10 

EMIiGSVVGNWQYFFPVIPSKASSSLQLVFG 
GAGATGCTGG6AAGCGTCGTG6GAAACTGGCAGTATTTCTTTCCCGTCATCTTTAGCAAAGCCTCCAGCTCCCTGCAACTGGTCTTCGGA 

GAGE-1 #1 

AAMSWRGRSTYRPRPRRYVEPPEMX GPMRP 
GCCGCTATGTCCTGGAGAGGCAGAAGCACATACAGACCCAGACCCAGAAGGTATGTGGAACCCCCTGAGATGATCGGACCCATGAGGCCT 

PRAME #2 

YXSMSVWTSPRRLVELAGQSLLKDEALAIA 
TACATTAGCaVTGAGCGTOTGGACTAGCCCTAGGAGACTGGTCGAGCTCGCCGGACAGTCCCTGCT 

MAGE-1 #16 

YDGR EHSAYGEPRKLLTQDLVQEKYLEYRQ 
TACGAT6GCAGAGAGCATAGCGCTTACGGAGAGCCTAGGAAACTGCTCACCCAAGACCTCGTGCAAGAGAAATACCTCGAGTATAGGCAA 

IiAGBl #12 

QC.FLPVFLAQAPSGQRRAA 
CAGTGTTTCCTCCCCGTCTTCCTCGCCCAAGCCCCTAGCGGACAGAGAAGG6CTGCC 
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MAGE- 3 #20 

VKVLHHMVKISGGPHiSYPPLHEWVLREGE 
GTGAAAGTGCTCCACCATATGGTCAAGATTAGCGGAGGCCCTCACATTAGCTATCCCCCTCTGCATGAGTGGGTGCTCT^GGGAAGGCGAA 

LAGEl #7 

QLHITMPFSSPMEAELVRRILSRDAAPLPR 
CAGCTCCACATTACCATGCCCTTTAGCTCCCCCATGGAGGCTGAGCTCGTGAGAAGGATTCTGTCCAGGGATGCCGCTCCCCTCCCCAGA 

NYNSOla #9 

PGVLLKEFTVSGNILTIRLTAADHRQLQIjS 
CCCGGAGTGCTCCTGAAAGAGTTTACCGTCAGCGGAAACATTCTGACAATCAGACTGACAGCCGCTGACCATAGGCAACTGCAACTGTCC 

PRAME #16 

KMILKMVQLDSIEDLEVTCTWKLPTLAKFS 
AAGATGATCCTCAAGATGGTGCAACTGGATAGCATTGAGGATCTGGAAGTGACATGCACATGGAAACTGCCTACCCTCGCCAAATTCTCC 

MAGE-1 #14 

FLIIVLVMIAMEGGHAPEEEIWEELSVMEV 
TTCCTCATCATTGTGCTCGTGATGATCGCTATGGAAGGCGGACACGCTCCCGAAGAGGAAATCTGGGAGGAACTGTCCGTGATGGAGGTC 

PRAME #17 

EVTCTWKLPTLAKFSPYLGQMINIiRRLLLS 
GAGGTCACCTGTACCTGGAAGCTCCCCACACTGGCTAAGTTTA6CCCTTACCTCGGCCAAATGATTAACCTCAGGAGACTGCTCCTGTCC 

MAGE -3 #2 

EGLEARGEALGLVGAQAPATEEQEAAS SSS 
GAGGGACTGGAAGCCAGAGGCGAAGCCCTCGGCCTCGTGGGAGCCCAAGCCCCTGCCACAGAGGAACAGGAAGCCGCTAGCTCCAGCTCC 

MAGE-3 #21 

I SYPPLHEWVLREGEEAA 
ATCTCCTACCCTCCCCTCCACGAATGGGTCCTGAGAGAGGGAGAGGAAGCCGCT 

PRAME #19 

HIHASSYISPEKEEQYIAQFTSQPLSLiQCL 
CACATTCACGCTAGCTCCTACATTAGCCCTGAGAAAGAGGAACAGTATATCGCTCaGTTTACCTCCCa^GTTTCTGTCCCTGCM 

imrsoia #3 

GNAGGPGEAGATGGRGPRGAGAARASGPGG 
GGCAATGCCGGAGGCCCTGGCGAAGCC6GAGCCACAGGCGGAAGGGGACCCAGAGGCGCTGGCGCTGCCAGAGCCTCCGGCCCTGGCGGA 

NYNSOla #4 

GPRGAGAARASGPGGGAPRGPHGGAASGLN 
GGCCCTAGGGGAGCCGGAGCCGCTAGGGCTAGCGGACCCGGAGGCGGAGCCCCTAGGGGACCCCATGGCGGAGCCGCTAGCGGACTGAAT 

MAGE-1 #5 

QGASAFPTTINFTRQRQPSEGSSSREEEGP 
CAGGGAGCCTCCGCCTTTCCCACAACCATTAACTTTACCA6ACAGAGACAGCCTAGCGAAGGCTCCAGCTCCAGGGAAGAGGAAGGCCCT 

NYNSOla #8 

LARRSLAQDAPPLPVPGVLLKEFTVSGNI L 
CTGGCTAGGAGAAGCCTCGCCCAAGACGCTCCCCCTCTGCCTGTGCCTGGCGTCCTGCTCAAGGAATTCACAGTGTCCGGCAATATCCTC 

PRAME #5 

AAFDGRHSQTLKAMVQAWPFTCIiPLGVIiMK 
6CCGCTTTCGATGGC:AGAC:ACTCCCa^GACACTGAAAGCCM»G6TGCAAGCCTGGCCCTTTACCTGTCTGCCTCTGGGAGTGCTC^ 

MAGE-1 #20 

IKVSARVRFFFPSLREAALREEEEGVAA 
ATCAAA6TGTCCGCCAGAGTGAGATTCTTTTTCCCTAGCCTCAGGGAAGCCGCTCTGAGAGAGGAAGAGGAAGGCGTCGCCGCT 

PRAME #27 

G I TDD QIiLALL PS LSHCSQLTTLS FYGNS I 
GGCATTACCGATGACCa^CTGCTCGCCCTCCTGCCTAGCCTCAGCCATTGCTCCCAGCTCACCACACTGTCCTTCTATGGCAATAGCATT 

6AGE~1 #8 

VKTPEEEMRSHYVAQTGILWLLMNNCFLNL 
GTGAAAACCCCTGAGGAAGAGATGAGGTGCCACTATGTGGCTCAGACAGGCATTCTGTGGCTGCTCATGAATAACTGTTTCCTCAACCTC 

IiAGEl #11 

ISSCLQQIiSLLMWlTQCFLPVFLAQAPSGQ 
ATCTCCAGCTGTCTGCAACAGCTCAGCCTCCTGATGTGGATTACCCAATGCTTTCTGCCTGTGTTTCTGGCTCAGGCTCCCTCCGGCCAA 
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FRAME #14 

YLI EKVKRKKNVLRLCCKKLKIFAMPMQDI 
TACCTCATCGAAAAGGTCAAGAGAAAGAAAAACGTCCTGAGACTGTGTTGCAAAAAGCTCy^GATTTTCGCTATGCCTATGCA^ 

MAGE-1 #9 

AREPVTKAEMLESVIKNYKHCFPEIFGKAS 
GCCAGAGAGCCTGTGACAAAGGCTGAGAT6CTGGAAAGC6TCATCAAAAACTATAAGCATTGCTTTCCCGAAATCTTTGGCAAAGCCTCC 

LAGEl #8 

LVRRILSRDAAPLPRPGAVLKDFTVSGNLL 
CTGGTCAGGAGAATCCTCAGCAGAGACGCTGCCCCTCTGCCTAGGCCTGGCGCTGTGCTCAAGGATTTCACAGTGTCCGGCAATCTGCTC 

PRAME #28 

HCSQLTTLSFYGlTSISISALQSLIiQHLIGIi 
CACTGTAGCCAACTGACAACCCTCAGCTTTTACGGAAACTCCATCTCCATCTCCGCCCTCCAGTCCCTGCTCCAGCATCTGATTGGCCTC 

PRAME #33 

MVWItSANPCPHCGDRTPYDPEPILCPCFMP 
ATGGTCTGGCTCA6CGCTAACCCTTGCCCTCACTGTGGCGATAGGACATTCTATGACCCTGAGCCTATCCTCTGCCCTTGCTTTATGCCT 

gplOOIll4 #1 

AASWSQKRSPVYVWKTVJGEGLPSQPIIHTC 
GCCGCTAGCTGGAGCCTyUlAGAGAAGCTTTGTGTATGTGTGGAAGACATGGGGAGAGGGACTGCCTAGCCAACCCATTATCC^ 

BAGE #2 

XiLQARLMKEESPVVSWRLEPEDGTAIiCFlP 
CTGCTCCAGGCTAGGCTCATGAAAGAGGAAAGCCCTGTGGTCAGCTGGAGGCTCGAGCCTGAGGATGGCACAGCCCTCTGCTTTATCTTT 

gplOOIii4 #3 

VYPFLPDHIiSFGRP FHIiNFCDFLAA 
GTGTATTTCTTTCTGCCTGACCATCTGTCCTTCGGAAGGCCTTTCCATCTGAATTTCTGTGACTTTCTGGCTGCC 

PRAME #18 

PYIiGQMINLRRLLLSHIHASSYIS PEKEEQ 
CCCTATCTGGGACAGATGATCAATCTGAGAAGGCTCCTGCTCAGCCaTATCCTiTGCCTCCAGCTATATCTCCCCCGAAAAGGAAGAGCJ^ 

MAGE -3 #3 

QAPATEEQEAASSSSTLVEVTLGEVPAAES 
CAGGCTCCCGCTACCGAAGAGCaAGAGGCTGCCTCCAGCTCCAGCACACTGGTCGAGGTCACCCTCGGCGAAGTGCCTGCCGCTGAGTCC 

PRAME #6 

QAWPFTCIiPLGVLMKGQHIiHLETFKAVLDG 
CAGGCTTGGCCTTTCACATGCCTCCCCGTCGGCGTCCTGATGAAGGGACAGCATCTGCATCTGGAAACCTTTAAGGCTGTGCTCGACGGA 

PRAME #12 

LSTEAEQPFXPVEVLVDLFLKEGACDELFS 
CTGTCCACCGAAGCCGAACAGCCTTTCATTCCCGTCGAGGTCCTGGTCGACCTCTTCCTCAAGGAAGGCGCTTGCGATGAGCTCTTCTCC 

NYNSOlb #3 

AEVPGAQGQQGPRGREEAPRGVRMAARLQG 
GCCGAAGTGCCTGGCGCTCAGG6ACAGCAAGGCCCTAGGGGAAGGGAAGAGGCTCCCAGAGGCGTCAGGATGGCGGCTAGGCTCCAGGGA 

LAGEl #5 

GAPRGPHGGAASAQDGRCPCGARRPDSRIiL 
GGCGCTCCCAGAGGCCCTCACGGAGGCGCTGCCTCCGCCCAAGACGGAAGGTGTCCCTGTGGCGCTAGGAGACCCGATAGCAGACTGCTC 

LAGEl #4 

GPRGAGAARASGPRGGAPRGPHGGAASAQD 
GGCCCTAGGGGAGCCGGAGCCGCTAGGGCTAGCGGACCCAGAGGCGGAGCCCCTAGGGGACCCCATGGCGGAGCCGCTAGCGCTCAGGAT 

PRAME #3 

LAGQSLLKDEALAIAALELLPRELPPPLFM 
CTGGCTGGCCMAGCCTCCTGAAAGACGAAGCCCTCGCCATTGCCGCTCTGGAACTGCTCCCCAGAGAGCTCTTCCCTCCCCTCTTCATG 

GAGE-1 #4 

EPATQRQDPAAAQEGEDEGASAGQGPKPEA 
GAGCCTGCCACACAGAGACAGGATCCCGCTGCCGCTCAGGAAGGCGAAGACGAAGGCGCTAGCGCTGGCCAAGGCCCTAA6CCTGAGGCT 

PRAME #11 

PEAAQPMTKKRKVDGLSTEAEQPFI PVEVL 
CCCGAAGCCGCTO^GCCTATGACAAAGAAAAGGAAAGTGGATGGCCTCAGCACAGAGGCTGAGCAACCCTTTATCCCTGTGGA^ 
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LAGEl #6 

GRCPCGARRPDSRL'liQLHITMPFSSPMEAE 
GGCAGATGCCCTTGCGGAGCCAGAAGGCCTGACTCCAGGCTCCTGCAACTGCATATCACAATGCCTTTCTCCT^GCCCTATGGAAGCCG 

LAGEl #9 

PGAVLKDFTVSGNLLPIRLTAADHRQLQLS 
CCCGGAGCCGTCCTGAAAGACTTTACCGTCAGCGGAAACCTCCTGTTTATCAGACTGACAGCCGCTGACCATAGGCAACTGCAACTGTCC 

PRAME #31 

EDIHGTIiHLERLAYLHARLRELLCELGRPS 
GAGGATATCCATGGCACACTGCATCTGGAAAGGCTCGCCTATCTGCATGCCAGACTGAGAGAGCTCCTGTGTGAGCTCGGCAGACCCTCC 

GAGE-1 #6 

DSQEQGHPQTGCECEDGPDGQEMDPPNPEE 
GACTCCCAGGAACAGGGACACCCTCAGACAGGCTGTGAGTGTGAGGATGGCCCTGACGGACAGGAAATGGATCCCCCTAACCCTGAGGAA 

TRP2IN2 #3 

FVIGLRVWQWEVISCKLIKRATTRQPAA 
TTCGTCATCGGACTGAGAGTGTGGCAGTGGGAGGTCATCTCCTGCTiAACTGATTAAGAGAGCCACAACCAGACAGCCTGCCGCT 

LAGEl #2 

DADGPGGPGIPDGPGGNAGGPGEA6ATGGR 
GACGCTGACGGACCCGGAGGCCCTGGCATTCCCGATGGCCCTGGCGGAAACGCTGGCGGACCCGGAGAGGCTGGC6CTACCGGAGGCAGA 

MAGE-1 #12 

PTGHSYVLVTCLGLSYDGLLGDNQIMPKTG 
CCCACAGGCCATAGCTATGTGCTCGTGACATGCCTCGGCCTCAGCTATGACGGACTGCTCGGCGATAACCAAATCy^T^ 

MAGE-3 #9 

FLLIiKYRAREPVTKAEMLGSVVGNWQYFFP 
TTCCTCCTGCTCAAGTATAGGGCTAGGGAACCCGTCACCAAAGCCGAAATGCTCGGCTCCGTGGTCGGCAATTGGCAATACTTTTTCCCT 

GAGE-1 #9 

TGILWLLMNNCPLNLSPRKPAA 
ACCGGAATCCTCTGGCTCCTGATGAACAATTGCTTTCTGAATCTGTCCCCCAGAAAGCCTGCCGCT 

MAGE-3 #8 

EFQAALSRKVAELVHFLLLKYRAREPVTKA 
GAGTTTCAGGCTGCCCTCAGCAGAAAGGTCGCC6AACTG6TCCACTTTCTGCTCCTGAAATACAGAGCCAGA6AGCCTGTGACAAAGGCT 

MAGE-l #18 

VPDSDPARYEFLWGPRALAETSYVKVLEYV 
GTGCCTGACTCCGACCCTGCCAGATACGAATTCCTCTGGGGACCCAGAGCCCTCGCCGAAACCTCCTACGTCAAGGTCCTGGAATACGTC 

NYNSOla #6 

GCCRCGARGPESRLLEFYLAMPFATPMEAE 
GGCTGTTGCAGATGCGGAGCCAGAGGCCCTGAGTCCAGGCTCCTGGAATTCTATCTGGCTATGCCTTTCGCTACCCCTATGGAAGCCGAA 

MAGE-3 #13 

ATCLGLSYDGLLGDNQIMPKAGLL IIVLAI 
GCCACATGCCTCGGCCTCAGCTATGACGGACTGCTCGGCGATAACCaAATCATGCCCAAAGCCGGACTGCTCATCaVTTGTGCTCGCC^ 

LAGEl #3 

GNAGGPGEAGATGGtRGPRGAGAARASGPRG 
GGCa^TGCCGGAGGCCCTGGCGAAGCCGGAGCCa^CT^GGCGGAAGGGGACCCaiGAGGCGCTGGCGCTGCCAGAGCCTCCGGCCCTAGGGGA 

Artificial Protein: 



APEEEIWEELSVMEVYDGREHSAYGEPRKLEEVPTAGSTDPPQSPQGASAFPTTINFTRQTWSGNRASLYSFPEPEAAQPMTKKRKVDGQIMPKA 

LIIVLAIIAREGDCAPEEKIWELQVLDLRKIJSHQDFWTVWSGNRASLYSFPELDVLLAQEVRPRRWKLQVLDLRKNSHQDFWQGAMLJ^ 

EVPGAQGQQGPRGRQSPSVSQLSVLSLSGVMLTDVSPEPLQALLLTQDLVQEKYLEYRQVPDSDPARYEFLWGPRQPSEGSSSREEEGPSTSCILESL 

FRAVITAAMAARAVFLALSAQLLQARLMKEESPWSTPYDPEPILCPCFMPNAAIELMEVDPIGHLYIFATCLGLSYDGLLGDNRRYVEPP 

PEQFSDEVEPATPEEGEAGGFFPWLKVYYYRFVIGLRWQWEVXSCAAMERRRLWGSIQSRYISMSVWTSPRRIiVEAALMETHLSSKRYTEEAGGFFP 

VJLKVYYYRAAMSLEQRSLHCKPEEALEAQQEALGLVCVQAATSSSSPLVLGTLEEVPTAGSTDPPQSPAIjELLPRELFPPLFMA^ 

elsvlevfegredsilgdpkklltqhfvqeeslqlvfgidvkeadptghsyvlvtclglspdppqspqgasslpttmnyplwsqsyedssaamqaegq 
gtggstgdadgpggpgipdgpgqcflpvflaqppsgqrraatwgeglpsqpiihtcvyfflpdhlsfgrpfstscileslfravitkkvadlvgflll 
kyraamqaegrgtggstgdadgpggpgipdgpgdgpdgqemdppnpeevktpeeemrshyvaqissclqqlsllmwitqcflpvflaqppsgqei^ 
tlqdlvfdecgitddqllallpslslgdpkklltqhfvqenyleyrqvpgsdpacealeaqqealglvcvqaatssssplvlgtlefylampfat 

AELARRSIlAQDAPPLPVEEAPRGVR^aAARLQGAAWRLEPEDGTALCFIFAAEQFSDEVEPATPEEGEPATQRQDPAAAQEGTMlS^YPLWSQSYEDSSN^ 
EEEGPSTFPDLESNQEEEGPSTFPDLESEFQAALSRKVAELVHVDLFLKEGACDELFSYLIEKVKRKKN^ 

LMWITAAMPLEQRSQHCKPEEGLEARGEALGLVGADADGPGGPGIPDGPGGNAGGPGEAGATGGRYEFLWGPRALVETSYVKVLI^^ 

crlsegdvmhlsqspsvsqlsvlslsgnyleyrqvpgsdpacyeflwgpralvetsyvifskassslqlvpgielmevdpighlyifqal^^ 
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RGRLDQLLRHVMNPLETLSYIAQFTSQFLSLQCLQALYVDSLPFLRGRLGQHIJILETFKAVIiDGLDVLIiA 
LQQLSLIl^OTITCCKKI.KIPAMPMQDIKMILKMVQLDSIEDLGAPRGPHGGAASGLNGCCRCGAR 

SViyDGLIKSDNQIMPKTGFLIIVLVMIMEGGHSISALQSLLQHLIGLSNLTHVLYPVPLESYIAREGDCAPEEKXWEELSVL^ 
HVMNPLETLSITNCRLSEGDVI4HLSRALAETSYVKVLEYVIKVSARVRFFFPSLRSNLTHVIi^ 

MAQGAMLAAQERRVPRAKNYKHCFPEI FGKASESLQLVFGIDVKEADTIiVEVTLGEVPAAES PDPPQS PQGAS SLPTHARLRELLCELG 
NPCPHCGDRVMLTDVSPEPLQAIiLERASATriQDIaVFDECEDEGASAGQGPKPEADSQEQGHPQTGCECEEMLGSWGJWQYFFPVIFSKASSSLQL^ 
GAAMSWRGRSTYRPRPRRYVEPPEMIGPMRPYISMSVWTSPRRLVEIAGQSLLKIDEALAIAYDGREHSAYGEPRKLLTQDLVQEKYLEYRQQ^ 
IiAQAP SGQRRAAVKVLHHMVKI SGGPHI S YPPLHEWVLREGEQLHITMPF S S PME AELVRRI LSRDAAPIjPRPGVLLKEFTVSGNI LTIRLTAADHRQ 
LQLSKMILKMVQLDSIEDLEVTCTWKLPTIiAKFSFLIIVLVMIAMEGGHAPEEEIWEELSVMEVEVTCTWKLPTI^ 

ARGEALGLVGAQAPATEEQEAASSSSISYPPLiHEWVIjREGEEAAHIHASSYISPEKEEQYIAQFTSQFLSLQCLGNAGGPGEAGATGGRGPRGAGAAR 
ASGPGGGPRGAGAARASGPGGGAPRGPHGGAASGIiNQGASAFPTTINFTRQRQPSEGSSSREEEGPIARRSIAQDAPPLPVPGVLLKEFTVSGNXLAA 
FDGRHSQTLKiyWQAWPFTCLPLGVLMKIKVSARVRFFFPSLREAALREEEEGVAAGITDDQLIALLPSLSHCSQLTTli 
VAQTGILWLI.MNNCFIiNIlISSCLQQIJSLIJ^mITQCFLPVFIAQAPSGQYI,IEKVKRraCI^ 

CFPEIFGKASLVRRILSRDAAPIiPRPGAVLKDFTVSGNLLHCSQLTTLSFYGNSISISALQSLLQHLIGLIWWLSANPCPHCGDRTFYDPEPILC 

MPAASWSQKRSFVYWKTWGEGIjPSQPIIHTCLLQARLMKEESPWSWRLBPEDGTALCFIFVYFFLPDHLSFGRPFHM^ 

LI^SHIHASSYISPEKEEQQAPATEEQEAASSSSTLVEVTLGEVPAAESQAWPFTCLPLGVIiMKGQHIjHIjETFKAVIiDGLST^^ 

IiKEGACDELFSAEVPGAQGQQGPRGREEAPRGV31MAARLQGGAPRGPHG6AASAQDGRCPCGARRPDSRLLGPRGAGAARASG^ 

AQDLAGQSIiliKDEAIiAIAALELLPRELPPPIiFI^PATQRQDPAAAQEGEDEGASAGQGPKPEAPEAAQPMTK^ 

GARRPDSRLLQmiTMPFSSPMEAEPGAVLKDFTVSGNLLPIRLTAADHRQIiQIiSEDIHGTLHLERIAYIiHARLI^ 

CEDGPDGQE^!DPPNPEEFVIGLRWQWEVISCKLIKRATTRQPAM)ADGPGGPGIPDGPGGNAGGPGEAGATGGRPTGHSYVlJVTCL^ 

QIMPKTGFLLLKYRAREPVTKAEMLGSWGNWQYFPPTGIIiWLIiRINNCPI^ 

EFLWGPRAIiAETSYVKVLEYVGCCRCGARGPESRIiLEFYLAMPFATPMEAEATCLGLSYDGLLGDNQIM 
RGAGAARASGPRG 

Artificial DNA: 



GCCCCTGAGGAAGAGATTTGGGAAGAGCTCAGCGTCATGGAAGTGTATGACGGAAGGGAACACTCCGCCTATGGCGAACCCAGAAAGCTCGAGGAAGT 

GCCTACCGCTGGCTCCACCGATCCCCCTCAGTCCCCCCAAGGCGCTAGCGCTTTCCCTACCACAATCAATTTCACAAGGCAAACCGTCTGGTCCGGCA 

ATAGGGCTAGCCTCTACTCCTTCCCTGAGCCTGAGGCTGCCCAACCCATGACCAAAAAGAGAAAGGTCGACGGACAGATTATGCCTAAGGCTGGCCTC 

CTGATTATCGTCCTGGCTATCATTGCCAGAGAGGGAGACTGTGCCCCTGAGGAAAAGATTTGGGAACTGCAAGTGCTCGACCTCAGGAAAAACTCCCA 

CCAAGACTTTTGGACAGTGTGGAGCGGAAACAGAGCCTCCCTGTATAGCTTTCCCGAACTGGATGTGCTCCTGGCTCAGGAAGTGAGACCCAGAAGGT 

GGAAGCTCCAGGTCCTGGATCTGAGAAAGAATAGCCATCAGGATTTCTGGCAGGGAGCCATGCTGGCTGCCCAAGAGAGAAGGGTCCCCAGAGCCGCT 

GAGGTCCCCGGAGCCCAAGGCCAACAGGGACCCAGAGGCAGACAGTCCCCCTCCGTGTCCCAGCTCAGCGTCCTGTCCCTGTCCGGCGTCATGCTCAC 

CGATGTGTCCCCCGAACCCCTCCAGGCTCTGCTCCTGACACaGGATCTGGTCCAGGAAAAGTATCTGGAATACAGACaGGTCCCCGATAGC^ 

CTAGGTATGAGTTTCTGTGGGGCCCTAGGCAACCCTCCGAGGGAAGCTCCAGCAGAGAGGAAGAGGGACCCTCCACCTCCTGCATTCTGGAAAGCCrC 

TTCAGAGCCGTCM'CACaGCCGCTATGGCTGCCaGAGCCGTCTTCCTCGCCCTCSlGCGCTCAGCTCCTGCT^GCC^ 

CGTCGTGTCCACCTTTTACGATCCCGAACCCATTCTGTGTCCCT6TTTCATGCCCAATGCCGCTATCGAACTGATGGAGGTCGACCCTATCGGACACC 

TCTACATTTTCGCTACCTGTCTGGGACTGTCCTACGATGGCCTCCTGGGAGACAATAGGAGATACGTCGAGCCTCCCGAAATGATTGGCCCTATGA6A 

CCCGAACAGTTTAGCGATGAGGTCGAGCCT6CCACACCCGAAGAGG6AGAGGCTGGCGGATTCTTTCCCTGGCTGAAAGTGTATTACTATAGGTTTGT 

GATTGGCCTCSySGGTCTGGCAATGGGAAGTGATTAGCTGTGCCGCTATGGAAAGGAGAAGGCTCTGGGGAAGCATTCAGTCCAGGTATATCTCC^ 

CCGTGTGGACCTCCCCCAGAAGGCTCGTGGAAGCCGCTCTGATGGAGACACACCTCAGCTCCAAGAGATACACAGAGGAAGCCGGAGGCTTTTTCCCT 

TGGCTCAAGGTCTACTATTACAGAGCCGCTATGTCCCTGGAACAGAGAAGCCTCCACTGTAAGCCTGAGGAAGCCCTCGAGGCTCAGCAAGAGGCTCT 

GGGACTGGTCTGCGTCCAGGCTGCCACAAGCTCCAGCTCCCCCCTCGTGCTCGGCACACTGGAAGAGGTCCCCACAGCCGGAAGCACAGACCCTCCCC 

AAAGCCCTGCCCTCGAGCTCCTGCCTAGGGAACTGTTTCCCCCTCTGTTTATGGCTGCCTTTGACGGAAGGCATAGCCAAACCCTCAAGGCTATGGTC 

GAGCTCAGCGTCCTGGAAGTGTTTGAGGGAAGGGAAGACTCCATCCTCGGCGATCCCAAAAAGCTCCTGACACAGCATTTCGTCCAGGAAGAGTCCCT 

GCAACTGGTCTTCGGAATCGATGTGAAAGAGGCTGACCCTACCGGACACTCCTACGTCCTGGTCACCTGTCTGGGACTGTCCCCCGATCCCCCTCAGT 

CCCCCCAAGGCGCTAGCTCCCTGCCTACCACAATGAATTACCCTCTGTGGAGCCAAAGCTATGAGGATAGCTCCGCCGCTATGCAAGCCGAAGGCCAA 

GGCACAGGCGGAAGCACAGGCGATGCCGATGGCCCTGGCGGACCCGGAATCCCTGACGGACCCGGACAGTGTTTCCTCCCCGTCTTCCTCGCCCAACC 

CCCTAGCGGACAGAGAAGGGCTGCCACCTGGGGCGAAGGCCTCCCCTCCCAGCCTATCATTCACACATGCGTCTACTTTTTCCTCCCCGATCACCTCA 

GCTTTGGCAGACCCTTTAGCACAAGCTGTATCCTCGAGTCCCTGTTTAGGGCTGTGATTACCAAAAAGGTCGCCGATCTGGTCGGCTTTCTGCTCCTG 

AAATACAGAGCCGCTATGCAAGCCGAAGGCAGAGGCACAGGCGGAAGCACAGGCGATGCCGATGGCCCTGGCGGACCCGGAATCCCTGACGGACCCGG 

AGACGGACCCGATGGCCAAGAGATGGACCCTCCCaVATCCCGAAGAGGTCAAGACACCCGAAGAGGAAATGAGAAGCCATTAC 

GCTGTCTGCTiACAGCTCTiGCCTCCTGATGTGGATTACCCS^TGCTTTCTGCCTGTGTTTCTGGCTa^GCCTCCCTCCGG 

ACACTGCAAGACCTCGTGTTTGACGAATGC6GAATCACAGACGATCAGCTCCTGGCTCT6CTCCCCTCCCTGTCCCTGGGAGACCCTAAGAAACTGCT 

CACCCAACACTTTGTGCSVAGAGAATTACCTCGAGTATAGGCAAGTGCCTGGCTCCGACCCTGCCTGTGAGGCTCTGGAAGCCC^ 

GCCTCGTGTGTGTGCAAGCCGCTACCTCCAGCTCCAGCCCTCTGGTCCTGGGAACCCTCGAGTTTTACCTCGCCATGCCCTTTGCC^ 

GCTGAGCTCGCa^GAAGGTCCCTGGCTCAGGATGCCCCTCCCCTCCCCGTCGAGGAAGCCCCTAGGGGAGTGAGAATGGCTGCCAGACTGO^GG^^ 

T6CCTGGAGACTGGAACCCGAAGACGGAACCGCTCTGT6TTTCATTTTCGCTGCCGAGCAATTCTCCGACGAAGTGGAACCCGCTACCCCTGAGGAAG 

GCGAACCCGCTACCCAAAGGO^AGACCCTGCCGCTGCCCAAGAGGGAACmTGAACTATCCCCTCTGGTCCCAGTCCTACGAA 

GAAGAGGAAGGCCCTAGCACATTCCCTGACCTCGAGTCCAACCAAGAGGAAGAGGGACCCTCCACCTTTCCCGATCTGGAAAGCGAATTCCAAGCCGC 
TCTGTCCAGGAAAGTGGCTGAGCTCGTGCATGTGGATCTGTTTCTGAAAGAGGGAGCCTGTGACGAACTGTTTAGCTATCTGATTGAGAAAGTGAAAA 
GGAAAAAGAATGTGCTCAGGCTCACCATTAGGCTCACCGCTGCCGATCACAGACAGCTCCAGCTCAGCATTAGCTCCTGCCTCCAGCAACTGTCCCTG 
CTCATGTGGATCACAGCCGCTATGCCTCTGGAACAGAGAAGCCAACACTGTAAGCCTGAGGAAGGCCTCGAGGCTAGGGGAGAGGCTCTGGGACTGGT 
CGGCGCTGACGCTGACGGACCCGGAGGCCCTGGCATTCCCGATGGCCCTGGCGGAAACGCTGGCGGACCCGGAGAGGCTGGCGCTACCGGAGGCAGAT 
ACGAATTCCTCTGGGGACCCAGAGCCCTCGTGGAAACCTCCTACGTCAAGGTCCTGCATCACATGGTGAAAATCTCCGGCGGACCCCATATCACAAAC 
TGTAGGCTCAGCGAAGGCGATGTGATGCACCTCAGCCAAAGCCCTAGCGTCAGCCAACTGTCCGTGCTCAGCCTCAGCGGAAACTATCTGGAATACAG 
ACAGGTCCCCGGAAGCGATCCCGCTTGCTATGAGTTTCTGTGGGGCCCTAGGGCTCTGGTCGAGACAAGCTATGTGATTTTCTCCAAGGCTAGCTCCA 
GCCTCCAGCTCGTGTTTGGCATTGAGCTCATGGAAGTGGATCCCATTGGCCATCTGTATATCTTTCAGGCTCTGTATGTGGATAGCCTCTTCTTTCTG 
AGAGGCAGACTGGATCAGCTCCTGAGACACGTCATGAATCCCCTCGAGACACTGTCCTACATTGCCCAATTCACAAGCCAATTCCTCAGCCTCCAGTG 
TCTGCAAGCCCTCTACGTCGACTCCCTGTTTTTCCTCAGGGGAAGGCTCGGCCAACACCTCCACCTCGAGACATTCAftAGCCGTCCTGGATGGCCTCG 
ACGTCCTGCTCGCCCAAGAGGTCAG6CCTAGGAGATG6AAATTCATTAGGCTCACCGCTGCCGATCACAGACAGCTCCAGCTCAGCATTAGCTCCT6C 
CTCCAGCAACTGTCCCTGCTCATGTGGATCACATGCTGTAAGAAACTGAAAATCTTTGCCATGCCCATGCAGGATATCAaAATGATTCTGAAAATGGT 
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CCAGCTCGACTCCATCGAAGACCTCGGCGCTCCCAGAGGCCCTCACGGAGGCGCTGCCTCCGGCCTCAAC6GATGCTGTAGGTGTGGCGCTAGGG 

ccgaaagcagactgctcaagaaagtggctgacctcgtgggattcctcctgctcaagtatagggctagggaacccgtcaccaaagccg;^ 

TCCGTGATTTACGATGGCCTCCTGGGAGACAATCAGATTAT6CCTAAGACAGGCTTTCTGATTATCGTCCTGGTC3VTGATTGCCATGGAGGGAGGCCA 

tagcattagcgctctgcaaagcctcctgc^cacctcatcggactgtccaacgtcacccatgtgctctaccctgtgcctcto^ 

GGGAAGGCGATTGCGCTCCCGAAGAGAAAATCTGGGAGGAACTGTCCGTGCTCGAGGTCTTCGAAGGCAGAGAGGATAGCATTGACCAACTGCT 

CATGTGATGAACCCTCTGGAAACCCTCAGCATTACCAATTGCAGACTGTCCGAGGGAGACGTCTVTGCATCTGTCa^GGGCT 

TGTGAAAGTGCTCGAGTATGTGATTAAGGTCAGCGCTAGGGTCAGGTTTTTCTTTCCCTCCCTGAGAAGCTiATCTGACA 

CCCTCGAGTCCTACGAAGACATTmCGGAACCCTCCACCTCGAGAGACTGGCTTACCTCGCCGCTATGCTCATGGCTCAGGM^ 

ATGGCCCAAGGCGCTATGCTCGCCGCrCAGGAAAGGAGAGTGCCTAGGGCTAAGAATTACAAACACTGTT^^^ 

AAGCCTCCAGCTCGTGTTTGGCATTGACGTCAAGGAAGCCGATACCCTCGTGGAAGTGACACTGGGAGAGGTCCCCGCTGCCGAA^ 

CCCAAAGCCCTCAGGGAGCCTCCAGCCTCCCO^CACACGCTAGGCTCAGGGAACTGCTCTGCGAACTGGGAAGGCCTA^ 

AATCCCTGTCCCCATTGCGGAGACAGAGTGATGCTGACAGACGTCAGCCCTGAGCCTCTGCAAGCCCTCCTGGAAAGGGCTAGCGCTACCCTCC^^ 

TCTGGTCTTCGATGAGT6TGAGGATGAGGGAGCCTCCGCCGGACAGGGACCCAAACCCGAAGCCGATAGCCAAGAGCAAGGCCATCCCCAAACCGGAT 

GCGAATGCGAAGAGATGCTGGGl^GCGTCGTGGGAAACTGGCAGTATTTCTTTCCCGTCATCTTTAGCAAAGCCTCCAGGTCCCTGC^^ 

GGAGCCGCTATGTCCTGGAGAGGCAGAAGCACATACAGACCCAGACCCAGAAGGTATGTGGAACCCCCTGAGATGATCGGACCCATGAGGCCTTACAT 

TAGCATGAGCGTCTGGACAAGCCCTAGGAGACTGGTCGAGCTCGCCGGACAGTCCCTGCTCAAGGATGAGGCTCTGGCTATCGCTTACGATGGCAGAG 

AGCATAGCGCTTACGGAGAGCCTAGGAAACTGCTCACCCAAGACCTCGTGCAAGAGAAATACCTCGAGTATAGGCAACAGTGTTTCCTCCCCGTCTTC 

CTCGCCCAAGCCCCTAGCGGACAGAGAAGGGCTGCCGTGAAAGTGCTCCACCATATGGTCAAGATTAGCGGAGGCCCTCACATTAGCTATCCCCCTCT 

GCATGAGTGGGTGCTCAGGGAAGGCGAACAGCTCCACATTACCATGCCCTTTAGCTCCCCCATGGAGGCTGAGCTCGTGAGAAGGATTCTGTCCAGGG 

ATGCCGCTCCCCTCCCCAGACCCGGAGTGCTCCTGAAAGAGTTTACCGTCAGCGGAAACATTCTGACAATCAGACTGACAGCCGCTGACCATAGGCAA 

CTGCAACTGTCCAAGATGATCCTCAAGATGGTGCAACTGGATAGCATTGAGGATCTGGAAGTGACATGCACATGGAAACTGCCTACCCTCGCCAAATT 

CTCCTTCCTGATCATTGTGCTCGTGATGATCGCTATGGAAGGCGGACACGCTCCCGAAGAGGAAATCTGGGAG6AACTGTCCGTGATGGAGGTCGAGG 

TCACCTGTACCTGGAAGCTCCCCACACTGGCTAAGTTTAGCCCTTACCTCGGCCAAATGATTAACCTO^GGAGACTGCTCCTGTCCGAGGGAC^ 

GCCAGAGGCGAAGCCCTCGGCCTCGTGGGAGCCCAAGCCCCTGCCACAGAGGAACAGGAAGCCGCTAGCTCCAGCTCCSITCTCCTACCCTC^ 

CGAATGGGTCCTGA6AGAGGGAGAGGAAGCCGCT(^CATTCACGCTAGCTCCTACATTAGCCCTGAGAAAGAG6AA(^GTATATCGCTCA^^ 

CCCAGTTTCTGTCCCTGCAATGCCTCGGCAATGCCGGAGGCCCTGGCGAAGCCGGAGCCACAGGCGGAiWMGGACCCS^GAGGCGCTGGCGCTG 

GCCTCCGGCCCTGGCGGAGGCCCTAGGGGAGCCGGAGCCGCTAGGGCTAGCGGACCCGGAGGCGGAGCCCCTAGGGGACCCCATGGCGGAGCCGCTAG 

CGGACTGAATCAGGGAGCCTCCGCCTTTCCCS^CAACCATTAACTTTACaiGACAGAGACAGCCTAGCG 

CTCTGGCTAGGAGAAGCCTCGCCCa^GACGCTCCCCCTCTGCCTGTGCCTGGCGTCCTGCTCAAGGAATTCACAGTGTCCGGCAAT^^ 
TTCGATGGCAGACACTCCCAGACACTGAAAGCCATGGTGa^GCCTGGCCCTTTACCTGTCTGCCTCTGGGAGTGCTC^^ 

CAGAGTGAGATTCTTTTTCCCTAGCCTCAGGGAAGCCGCTCTGAGAGAGGAAGAGGAAGGCGTCGCCGCTGGCATTACCGATGACCAACTGCTCGC^ 

TCCTGCCTAGCCTCAGCCSVTTGCTCCCAGCTCACCACACTGTCCTTCTATGGCAATAGCATTGTGAAAACCCCTGAGGAAGAGATGAGGTCCCACTAT 

GTGGCTCAGAO^GGCATTCTGTGGCTGCTCATGAATAACTGTTTCCTCAACCTCATCTCCAGCTCTCTGCAACAGCTCAGCCTCCTG^^ 

CCAATGCTTTCTGCCTGTGTTTCTGGCTCAGGCTCCCTCCGGCCAATACCTCATCGAAAAGGTCAAGAGAAAGAAAAACGTCCTGAGACTGTGTTGCA 

AAAAGCTCAAGATTTTCGCTATGCCTATGCAAGACATTGCCAGAGAGCCTGTGACMAGGCTGAGATGCTGGAAAGCGTCATC^^ 

TGCTTTCCCGAAATGTTTGGCAAAGCCTCCCTGGTCAGGAGAATCCTCAGCAGAGACGCTGCCCCTCTGCCTAGGCCTGGCGCTGTGCTCAAGGATTT 

CACAGTGTCCGGCAATCTGCTCCACTGTAGCCM.CTGACAACCCTCAGCTTTTACGGAAACTCCATCTCC7iTCTCCGCCCTCCAGTCCCTGCTC^^ 

ATCTGATTGGCCTCATGGTCTGGCTCAGCGCTAACCCTTGCCCTCACTGTGGCGATAGGACATTCTATGACCCTGAGCCTATCCTCTGCCCTTGCTTT 

ATGCCTGCCGCTAGCTGGAGCCAAAAGAGAAGCTTTGTGTATGTGTGGAAGACATGGGGAGAGGGACTGCCTAGCCAACCCATTATCCATACCTGTC^ 

TTCTGCCTGACCT^TCTGTCCTTCGGAAGGCCTTTCCATCTGAATTTCTGTGACTTTCTGGCTGCCCCCTATCTGGGACAGATGATCy^ 

CTCCTGCTCAGCCATATCCATGCCTCCAGCTATATCTCCCCCGAAAAGGAAGAGCAACaGGCTCCCGCTACCGAAGAGCTiAGAGGCTGCCTC^^^ 

CAGGACACTGGTCGAGGTCACCCTCGGCGAAGTGCCTGCCGCTGAGTCCCajGGCTTGGCCTTTCACATGCCTCCCCCTC^^ 

AGCATCTGCATCTGGAAACCTTTAAGGCTGTGCTCGACGGACTGTCCACCGAAGCCGAACAGCCTTTCAOT^ 

CTCAAGGAAGGCGCTTGCGATGAGCTCTTerCCGCCGAAGTGCCTGGCGCTCAGGGACAGCAAGGCCCTA 

CAGGATGGCCGCTAGGCTCCAGGGAGGCGCTCCCyVGAGGCCCTa^CGGAGGCGCTGCCTCCGCCCAAGACGGAAGGTGTCCCTGTGGCGCT^^ 

CCGATAGCA6ACTGCTCG6CCCTAGGGGAGCCGGAGCCGCTAGGGCTAGCGGACCCAGAGGCGGAGCCCCTAGGGGACCCCATGGCGGAGCCGCTAGC 

GCTCAGGATCTGGCTGGCCAAAGCCTCCTGAAAGACGAAGCCCTCGCCATTGCCGCTCTGGAACTGCTCCCCAGAGAGCTCTTCCCTCCCCTCTTCAT 

GGAGCCTGCCACACAGAGACAGGATCCCGCTGCC6CTCAGGAAGGCGAAGACGAAGGCGCTAGCGCTGGCCAAGGCCCTAAGCCTGAGGCTCCCGAAG 

CCGCTCAGCCTATGACAAAGAAAAGGAAAGTGGATGGCCTCAGCACAGAGGCTGAGCAACCCTTTATCCCTGTGGAAGTGCTCGGCAGATGCCCTTGC 

GGAGCC^iGAAGGCCTGACTCCAGGCTCCTGCAACTGCATATCACAATGCCTTTCTCCAGCCCTATGGAAGCCGAACCCGGAGCCGTCCTGAAAGACTT 

TACCGTCAGCGGAAACCTCCTGTTTATCAGACTGAO^GCCGCTGACCATAGGCAACTGCAACTGTCCGAGGATATCCATGGCACACTGCATCTGGAA^ 

GGCTCGCCTATCTGCATGCCAGACTGAGAGAGCTCCTGTGTGAGCTCGGCAGACCCTCCGACTCCCAGGAACAGGGACACCCTCAGACAGGCTGTGAG 

TGTGAGGATGGCCCTGACGGACAGGAAATGGATCCCCCTAACCCTGAGGAATTCGTCATCGGACTGAGAGTGTGGCAGTGGGAGGTCATCTCCTGCAA 

ACTGATTAAGAGAGGCACAACCAGACAGCCTGCCGCTGACGCTGACGGACCCGGAGGCCCTGGCATTCCCGATGGCCCTGGCGGAAACGCTGGCGGAC 

CCGGAGAGGCTGGCGCTACCGGAGGCAGACCCACAGGCCATAGCTATGTGCTCGTGACATGCCTCGGCCTCAGCTATGACGGACTGCTCGGC6ATAAC 

CAAATCATGCCCAAAACCGGATTCCTCCTGCTCAAGTATAGGGCTAGGGAACCCGTCACCAAAGCCGAAATGCTCGGCTCCGTGGTCGGCAATTGGCA 

ATACTTTTTCCCTACCGGAATCCTCTGGCTCCTGATGAACAATTGCTTTCTGAATCTGTCCCCCAGAAAGCCTGCCGCTGAGTTTCAGGCTGC^ 

GCAGAAAGGTCGCCGAACTGGTCCACTTTCTGCTCCTGAAATACAGAGCCAGAGAGCCTGTGACAAAGGCTGT6CCTGACTCCGACCCTGCCAGATAC 

GAATTCCTCTGGGGACCCAGAGCCCTCGCCGAAACCTCCTACGTCAAGGTCCTGGAATACGTCGGCT6TTGCAGATGCGGAGCCAGAGGCCCTGAGTC 

CAGGCTCCTGGAATTCTATCTGGCTATGCCTTTCGCTACCCCTATGGAAGCCGAA6CCACAT6CCTCGGCCTCAGCTATGACGGACT6CTCGGCGATA 

ACCAAATCATGCCCAAAGCCG6ACTGCTC3^TCATT6TGCTCGCCATTGG<^TGCCGGAGGCCCTGGC 

AGAGGCGCTGGCGCTGCCAGAGCCTCCGGCCCTA66GGA 
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Cassettes for construction of a full-length HIV Savine 

Cassette Al 

ggatccaccATGACAGGCCCTTGCACAAACGTCAGCACCGTGCAATGCACACACGGAATCAGACCCGTCGTGTCCA 

CCCAACTGCTCCTGAATGGCTCCGTGAGAAGCCTCTACAATACCGTCGCCACACTGTGGTGCGTCCACCAAAGGAT 

TGACGTCAGGGACACAAAGGAAGCCCTCGACAAAATCGAACTCGGCGATGGCGGAGGCGCTGAAAGGCAAGGCACC 

TCCAGCTCCTTCAACTTTCCACAAATCACACTGTGGCAAAGGCGTCTGGTCACCGAACCCTTCAGAAAAAAGAATC 

CCGATATGGTGATTTACCAGTACATGGACGATCTGTATGTGGGAAGCGATCTGGAAATCGGACAGCATTTTACCAC 

ACCCGATAAGAAACACCAAAAGGAACCACCATTCCTCTGGATGGGATACGAACT6CATCCCGATAGGTGGACCGTC 

CAGCCTCTTAATTTCCCTCAGATTACCCTCTGGCAGCGTCCCCTCGTGACAATCAAAATCGGCGGACAGCTCATAG 

AGGCTCTGCTCGACACAGGCTCCTATGGCAGAAAGAAACGTAGGCAACGTAGACGCGCTGCTCAGAGCAGCAAGGA 

TCACCAATACCCTATCTCTGAGCAACCCCTCTCCTTCTTTAGGGAAAACCTGGCTTTCCAGCAAGGTAAAGCCAGA 

GAGTTTTCCAGCGAACAGACAAGAGCCAATAGCTCCGCCTCCAGGAAGAGCCCCCAAATCTCCGGCGAAAGCTCCG 

TCATTCTGGGATCTGGCACCAA2UUVCGCCGCTACTAGAAGAATCGAAGTGAAAGATACCAAAGAGGCTTTGGATAA 

GATTGAGGAGGTGCAAAAGAAAAGCGAGCAAAAGACACAACAGGCTGCCGCTAAAGCCGGATACGTCACCGATAGG 

GGAAGGCAAAAGATTATCTCCCTGACAGAGACAACCAATCAGAAAACCGAACTGCATGCCATTCAAGAAGCCACTA 

CCACACTGTTTTGCGCCAGCGATGCCAAAGCCTATGAGACAGAGGTCCACAATGTGTGGGCCACACACGCT^ 

CCCCGCTGACGATACAGTGCTGGAGGAGATGAACCTCCCCGGAAAATGGAAGCCTAAGATGATTGGCGGAATCGGC 

GGATTCATTAAGGTGAGAAAAATCGGACCCGAAAACCCTTACAATACCCCAATCTTCGCTATCAAGAAAAAGGACT 

CCACCAAATGGAGAAAGCTCGTGGATTTCAGAGTTAGGATTATCAATATCCTCTACCAAAGCAATCCCTATCCTAG 

CTCCGAAGGCTCCAGGCAAACCAGAAAGAATAGGAGAAGGAGATGGGGAGGCGAACGGGGTAGGGATAGGTCCGTG 

AGACTGGTCAACGGATTCTTAGCCCTCGCCTGGGACGATCTGAGAAACCTCTGCCTCTTCGAAAACCTCTGGGTCA 

CCGTCTACTATGGCGTCCCCGTCTGGAGAGAGGCTGCCACAACCCTCTTCTGTGCCTCCGACGCTAAGGCTTACGC 

TGCCATGGCTGGCAGAAGCGGCGGCACAGACGAAGAGCTCCTGAGGGCTATCAGAATCATTAACATTCTGTATCAG 

TCCAACCCTTACCCTTCCGCTAGTATGAGAATCAGAACCTGGAACAGCCTGGTCAAGCATCACATGCACATCTCCA 

AGAAAGCCAAAGGCTGGTTCTATAGGCATCACTTTGAGGAGTCCGAGCTCGTGAATCAGATTATCGAAAAGCTCAT 

CAAAAAGGAAAAGGTCTACCTATCATGGGTACCAGCCCACAAGGGAATCGGACAAACCAAAGAGCTCCAGAAACAG 

ATTATCAAAATCCAAAACTTTAGGGTCTACTATAGGGATAGCAGAGACCCTATCTGGAAGGGACCCAAAAGCTTTG 

AGGAAATCTGGAACAATATGACATGGATTGAGTGGGAGAGAGAGATTAGCAATTACACAAGCCAAATCTATAAGAT 

TCTGAAACCCGAACCCACAGCCCCTCCCGCTGAGAATTTCAGATTCGGTGAGGAAACTACACCCTCCCAAAAGCAA' 

GAGCAAAAGGATAAGGAGCAATACGATCAGATTCTTATTGAGATTTGCGGCAAGAAAGCTATTGGTACGGTGCTCG 

TGGGACCTACCCCTGTGAATATCATTGGCAGAATTTACGAAACCTATGGCGATACCTGGGAGGGCGTCGAGGCTCT 

GATCAGAATCCTCCAGCAACTGATGTTTATCCATTTCAGAATCGGATGTTTTCATTGCCAAGTGTGTTTTCTCACC 

AAAGGTCTCGGCATTAGCCACGGAAGGAAAAAGAGAAAACAGAGAAGGGGAGCTCCCCAAGCTGCCATGGACCCCG 

TGGACCCCAAGCTGGAGCCTTGGAAACACCCTGGCTCCCAGCCTAAGACAGCCTGTTACAAATGCTATTGCAAAAA 

GTGCCCTAGCGAAGAGACAACCCCTAGCCAGAAACAGGAACAGAAAGACAAAGAACTCTACCCCCCTTTAGCCAGC 

CTCAAGTCCCTGTTTGGCAATGAGAATTTCAATATGTGGAAGAATGACATGGTGGAACAGATGCAAGAAGACATTA 

TCTTACTATGGGACCAAAGCCTCAAGCCTTGCGTCAAGCTCGACGTCGGCGATGCCTATTTCTCCGTGCCTCTGGA 

TAAAAACTTCAGAAAGTATACCGCTTTCACAATCCCTAGCACAAACAATGAGCAACTGAAAGGCGAAGCCATCCAT 

GGCCAAGTGAATTGCTCACCAGGCATTTGGCAACTGGATTGCACACACCTGGAGGGAAAGATTATCCCTAAGGTCA 

AGCAATGGCCTCTGACAGAGGAAAAGATTAAGGCTCTGACTGAGATTTGCAAAGAGATGGAGGAAGAGGGAAAGAT 

TAGCATGGATGACCTCTACGTCGGCTCCGACCTGG 
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AGATTGGCCAACATAGGACCAAAATCGAAGAGCTCAGGGAACACCTCCTGAAATGGGGACTCACCGAAACCACAAA 

CCAAAAGACTGAGCTCCAAGCTATCCATCTGGCTCTGCAAGACTCCGGCTTAGAGGTCAACATTGTGACAGACATT 

CCCGCTGAGACTGGTCAAGAGACCGCCTTTTTCATTCTGAAACTGGCTGGCAGATGGCCTGTGAAAGTCATTCACA 

CAGACAATGGCAGGACAAAGATTGAGGAACTGAGACCGCATCTGCTCAAATGGGGCTTCACAACCCCTGACAAAAA 

GCATCAGAAAGAGCCTCCCTTTCTGTCTAGTGTCAAGAAACTGACAGAGGATAAGTGGAACGAACCCCAGAAAATC 

AAGAGACGCAGAGAAAATCACACAATGAATGGCCATACTGCCACAGAGTCCCAGAATCA.GGAAGACAGAAACGAAA 

AGGAACTGCTGGAGCTCGACAAATGGGCAAGCCTCTGGAATTGGTTTAACATTACCGACACCGGAAATAGCTCCAA 

AGTGTCCCAGAATTACCCTATCGTCCAGAATGTCCAAGGCCAAATGGTCCACCAACCCCTCTCCCCCAGACTCATC 

GGACTGAGAATCGTTTTCGCTGTGCTCAGCATTATCAATAGGGTCAGGCAAGGCTATAGCCCTCTGTCCTT 

CCCTCCCCCTCATCCATCTGCAATACTTTGACTGTTTCGCTGACTCCACCATTAGGAGAGCCATCTTGGGACA^ 

AGTGAGAAGGAGATGCGAATACGCTGTGGGACTCGGAGCCATGTTCCTTGGCTTTCTGGGTGCCGCTGGCTCCACC 

ATGGGCGCTGCCTCCATGACACTGACAGTGCAAGCCTATGACCCTAGCAAAGACCTCATTGCTGAGATTCAGAAAC 

AGGGCCAGGGTCAGTGGACATTTCAGATTTTCCAAGAGCCTTTCAAAAACGGAACCGTCCTGGTCGGCCCTACACC 

CGTCAACATCATCGGAAGGAACATGCTGACACAGCTTGGCCGCACTCTCTy^CTTTCCCATTAGCA^ 

GCTATCTTTCAGTCCAGCATGCCACAGATTCTGGAGCCTTTTAGGATAAAAAACCCTGAGATGGTCATCTATCAGT 

ATCCTAGCCCTCTGACATTCGGATGGTGTTTCAAACTGGTCCCCGTGGACCCCAGCGAAGTdGAAGAGATCAACAA 

GGGCGAAAACAATTGCCCCCTGTTTAGGAAATACACAGCCTTTACCATTCCCTCCATCAATAACGAAACCCCTGGC 

ATTAGGTATCAGTATAACGTCCTGCCTCAGGGATGGGGAAGCACAATGGGAGCCGCCAGCATGACCCTCACCGTCC 

AGGCTAGGCTACTGCTCAGCGGAATCGTCCAGCAACAGAGCAATCTGCTGGAGGAGAATAGGGAAATCCTCAGAGA 

GCCTGTGCATGGCGTCTACTACGATCCCTCCAAGGATCTGGTCGCTGAAATCCAAAAGCAAGGCAGAGAGGAACTG 

TCCACCATGGTGGAXATGGGAAACTACGACCTCGGAGTGGACAATAACCTCGCCGCTATTAGAATCCTGCAACAGC 

TCATGTTCATTCACTTTAGGATTGGCTGCCAGCACTCCAGGATTGGCATCATCCGTCAGAGAAGGGCCAGAGCTCC 

CAGGAAAAAGGGATGCTGGAAGTGTGGCAGAGAGGGACACCAGATGAAGGATTGCACTGAGAGACAGGCTAACTTT 

CTGGGAAAGGATGCCAGACTGGTTATCAAAACCTATTGGGGACTGCATACCGGTGAGAGAGACTGGCACCTCGGCC 

ATGGCGTCAGCATTGAGTGGAGGATAAGGGAAAGGGCTGAGGATAGCGGCAACGAAAGCGAAGGCGACACAGAAGA 

GCTCAGCACATTGGTGGACATGGGCAATTACGATCTGTCTAGCCCTGCCCCCAGGGGACCCGATAGGCTGGAGAGA 

ATCGAAGAGGAAGGCGGAGAGCAAGGCAGAGGCAGAAGCGTCAGGCTCGTGAATGGCAGAGAGGTCGAGGAAGTCA 

ATGAGGGAGAGAATAACTGTCTGCTTCACCCTATCAGTCAACATGGCATGGAAGACGAAGAGAGAGAGGTCAATAG 

CGATATCAAAGTGGTCCCCAGAAGGA2^GCCAAAATCATTAGGGATTACGGAAAGCAAATGGCTGGCGATGACTGT 

GTGGCCAGCTTCTCTTCCGAGCAAACAGGGGCTAACTCCTCTACAAGCAGAAAGCTGGGAGACGGAGGCGGAGCCG 

ACAGACAGGGAACAAGCTCCAGCTGTTTCAATTGCGGCAAAGAGGGACACATTGCCAAAAACTGTAGGGCCCCTCG 

CAAGAAAGGTTGTTGGAAATGCGGAAAGGAAGGCCATCAAATGAAAGACTGTACCGAAAGGCAAGCC^ 

GGCAAAATCTGGCCCTCCAACAAAGGCAGACCCGGAAACTTTCTCCAAAGCAAATGGCTCTGGTATATCAAAATCT 

TTATCATGATCGTCGGTGGACTGATTGGCCTCAGGATTATCTTTGCCGTCCTGTCCATCGTTAACGGAGCCGTGAG 

CCGAGACCTCGATAAACATGGCGCTATTACAAGCTCCAATACCGCTGCCAATAACGCTGACTGTGTCTGGCTGAAG 

GCTGCTGCCATGACACCCCTGGAGATCATCGCTATCGTCGCCTTTATCGTCGCCCTCATCATAGCCATTGTGGTCT 

GGACAATCGTCTACATTGAGTATGTCGACtgaagatctgaattc 
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A2 fragment 

ggatccaccATGACAGGCCCTTGCACAAACGTCAGCTCCGTGCAATGCACACACGGAATCAAA.CCCGTCGTGTCCA 
CCCAACTGCTCCTGAATGGCTCCCTGAAAAGCCTCTACAATACCGTCGCCACACTGTGGTGTGTCCACCAAAGGAT 
TGAGGTCAAGGACACAAAGGAAGCCCTCGACAAAATCGAACTCGGCGATGGCGGAGGGGCTGAAAGGCAAGGCACC 
TCCAGCTCCATCSy^CTTTCCACAAATCACACTGTGGCAAAGGCCTCTGGTCA 

CCGAAATGGTGATTTACCAGTACATGGACGATCTGTATGTGGGAAGCGATCTGGAAATCGGACAGCATTTTACCAC 

ACCCGATAAGAAACACCAAAAGGAACCACCATTCCTCTGGATGGGATACGAACTGCATCCCGATAGGTGGACCGTC 

CAGCCTTTTAATTTCCCTCAGATTACCCTCTGGCAGCGTCCCCTCGTGACAATCAAAATCGGCGGACAGCTCATAG 

AGGCTCTGCTCGACACAGGCTCCTATGGCAGAAAGAAACGTAGGCAACGTAGACGCGCTCCTCAGAGCAGAAAGGA 

TCACCAATACCCTATCTCTGAGCAACCCCTCTCCTTCTTTAGGGAAAACCTGGCTTTCCAGCAAGGTAAAGCCAGA 

GAGTTTTCCAGCGAACAGACAGGAGCCAATAGCTCCGCCTCCAGGAAGAGCCCCCAAATCTCCGGCGAAAGCTCCG 

TCATTCTGGGATCTGGCACCAAAAACGCCGCTACTAGAAGAATCGATGTGAGAGATACCAAAGAGGCTCTGGATAA 

GATTGAGGAGGAGCAAAACAAAAGCAAGGAAAAGACACAACAGGCTGCCGCTAAAGCCGGATACGTCACCGATAGG 

GGAAGGCAAAAGATTATCTCCCTGACAGAGACAACCAATCAGAAAACCGAACTGCATGCCATTCAAGAAGCCGATA 

CCACACTGTTTTGCGCCAGCGATGCCAAAGCCTATGACACAGAGGTCCACAATGTGTGGGCCACACACGCTTGCGT 

CCCCGCTGACGATACAGTGCTGGAGGAGATGAACCTCCCCGGAAAATGGAAGCCTAAGATGATTGGCGGAATCGGC 

GGATTCATTAAGGTGAGAAAGATCGGACCCGAAAACCCTTACAATACCCCAATCTTCGCTATCAAGAAAAAGAACT 

CCACCAAATGGAGAAAGCTCGTGGATTTCAGAATTAGGATTATCAAAATCCTCTACCAAAGCAATCCCTATCCTAG 

CTCCGAAGGCa.CCAGGCAAACCAGAAAGAATAGGAGAAGGGGA!raGGGAGGCQAACAGGGTAGGGATAGGTCCGTG 

AGACTGGTCAACGGATTCTTAGCCCTCGCCTGGGACGATCTGAGAAGCCTCTGCCTCTTCGACAACCTCTGGGTCA 

CCGTCTACTATGGCGTCCCCGTCTGGAGAGAGGCTAACACAACCCTCTTCTGTGCCTCCGACGCTAAGGCTTACGC 

TGCCATGGCTGGCAGCAGCGGCAGCACA6ACGAAGAGCTCCTGAAG6CTGTCAGAATCATTAAGATTCTO 

TCCAACCCTTACCCTTCCGCTAGTATGAAAATGAGAACCTGGAAGAGCCTGGTCAAGCATCACATGTACATCTCCA 

AGAAAGCCAATGGCTGGTTCTATAGGCATCACTTTGAGGAGTCCGAGGTCGTGAATCAGATTATCGAAAAGCTTAT 

CAAAAAGGAAAAGGTCTACCTATCATGGGTACCAGCCCACAAGGGAATCGGACGAACCAAAGAGCTCCAGAAACAG 

ATTATCAAAATCCAAAACTTTAGGGTCTACTATAGGGATAGCAGAGACCCTATCTGGAAGGGACCCAAAAGCCTTG 

AGGAAATCTGGAACAATATGACATGGATTCAGTGGGAGAGAGAGATTAGCAATTACACAAACCTAATCTATAAGAT 

TCTGAGACCCGAACCCACAGCCCCTCCCGCTGAGAATTTCGGATTCGGTGAGGAAACTACACCOTCCCAAAAGC^ 

GAGCCAAAGGATAAGGAGCAATACGATCAGATTATTATTGAGATTTGCGGCAAGAAAGCTATTGGTACAGTGCTCG 

TGGGACCTACCCCTGTGAATATCATTGGCAGAATTTACGAAACCTATGGCGATACCTGGGAGGGCGTCGAGGCTCT 

GATCAGAATCCTCCAGCAACTGATGTTTATCCATTTCAGAATCGGATGTTTTCATTGCCAAGTGTGTTTTCTCACC 

AAAGGTCTCGGCATTAGCCACGGAAGGAAAAAGAGAAAACAGAGAAGGCGAGCTCCCCAAGCTGCCATGGACCCCG 

TGGACCCCAACCTGGAGCCTTGGAAACACCCTGGCTCCCAGCCTAAGACAGCCTGTAACAAATGCTATTGCAA2^ 

GTGCCCTAGCGAAGAGACAACCCCTAGCCAGAAACAGGAACAGAAAGACAAAGAACTCTACCCCCCTTTAGCCAGC 

CTCAAGTCCCTGTTTGGCAATGACAATTTCAATATGTGGAAGAATAACATGGTGGAACAGATGCAAGAAGACATTA 

TCTCACTATGGGACCAAAGCCTCAAGCCTTGCGTCAAGCTCGACGTCGGCGATGCCTATTTCTCCGTGCCTCTGGA 

TAAAAACTTCAGAAAGTATACCGCTTTCACAATCCCTAGCACAAACAATGAGCAACTGAAAGGCGAAGCCATGCAT 

GGCCAAGTGAATTGCTCACCAGGCATTTGGCAACTGGATTGCACACACCTGGAGGGAAAGATTATCCCTAAGGTCA 

AGCAATGGCCTCAGACAGAGGAAAAGATTAAGGCTCTGACTGAGATTTGCACAGAGATGGAGCAAGAGGGAAAGAT 

TAGCATGGATGACCTCTACGTCGGCTCCGACCTGGAGATTGGCCAACATAGGACCAAAATCGAAGAGCTCAGGGCA 
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CACCTCCTGAGATGGGGACTCACCGACACCACA?^CCAAAAGACTGAGCTCCACGCTATCCATCTGGCTCTGCAAG 
ACTCCGGCTTAGAGGTCAACATTGTGACAGACATTCCCGCTGAGACTGGTCAAGAGACCACCTATTTCATTCTGAA 
ACTGGCTGGCAGATGGCCTGTGAGAATCATTCACACAGACAATGGCAGGACAAAGATTGAGGAACTGAGACCGCAT 
CTGCTCAAATGGGGCTTCACAACCCCTGACAAAAAGCGTCAGAAAGAGCCTCCCTTTCTGTCTAGTGTCAAGAAAC 
TGACAGAGGATAAGTGGAACAAACCCCAGAAAATC^GGGACACAGAGAAAATCACACI?^^ 

CACAGAGTCCCAGAATCAGCAAGACAGAAACGAAAAGGAACTGCTGGAGCTCGACAAATGGGCAAGCCTCTGGAAT 

TGGTTTAACATTACCGACACCGGAAGTAGCTCCCAAGTGTCCCAGAATTACCCTATCGTCCAGAATCTCCAAGGCC 

AT^ATGGTCCACCAACCCATCTCCCCCAGACTCGTCGGACTGAGAATCATTTTCGCTGTGCTCAGCATTATCAATAG 

GGTCAGGCAAGGCTATAGCCCTCTGTGCTTCCAAACCCTCACCCTCATCCATCTGTATTACTTTGACTGTTTCGCT 

GACTCCACCATTAGGAGAGCCATCCTTGGACACAGAGTGAGCAGGAGATGCGAATACGCTGTGGGAATCGGAGCCA 

TGTTCCTTGGCTTTCTGGGTGCCGCTGGCTCCACCATGGGCGCTGCCTCCATCACACTGACAGTGCAAGCCTATGA 

CCCTAGCAAAGACCTCATTGCTGAGATTCAGAAACAGGGTCAGGATCAGTGGACATATCAGATTTTCCAAGAGCCT 

TTCAAAAACGGAACCGTCCTGGTCGGCCCTACACCCGTCAACATC7VTCGGAAGGAACCTGCTGACACffi.GATA 

GCACCCTCAACTTTCCCATTAGCAAAGGCAGCCCTGCTATCTTTCAGTCCAGC:a.TGACACAGA 

TAGGAAACAAAACCCTGACATGGTCATCTATCAGTATCCTAGCCCTCTGACATTCGGATGGTGTTTCAAACTGGTC 

CCCGTGGACCCCAGCGAAGTGGAAGAGACCAACAAGGGCGAAAACAATTGCCTCCTGTTTAGGAAATACACAGCCT 

TTACCATTCCCTCCACCAATAACGAAACCCCTGGCATTAGGTATCAGTATAACGTCCTGCCTCAGGGATGGGGAAG 

CACAATGGGAGCCGCCAGCATGACCCTCACCGTCCAGGCTAGGGRACTGCTCAGCGGAATCGTCCAGCAACAGAA 

AATCTGCTGGAGGAGAATAGGGAAATCCTCAAAGAGCCTGTGCATGGCGTCTACTACGATCCCTCCAAGGATCTGA 

TCGCTGAAATCCAAAAGCAAGGCACAGAGGAACTGTCCGCCTTGGTGGATATGGGAAACTACCACCTCGGAGTGGA 

CAATAACCTCGCCGCTATTAGAATCCTGCAACAGCTCATGTTCATTCACTTTAGGATTGGCTGCCAGCACTCCAGG 

ATTGGCATCATCCGTCAGAGAAGGGCCAGAGCTCCCAGGAAAAAGGGATGCTGGAAGTGTGGCAAAGAGGGACACC 

AGATGAAGGATTGCACTGAGAGACAGGCTAACTTTCTGGGAAAGGATGCCAGACTGGTTATCAAAACCTATTGGGG 

ACTGCATACCGGTGAGAGAGACTGGCACCTCGGCCATGGCGTCAGCATTGAGTGGAGGACAAGGGAAAGGGCTGAG 

GATAGCGGCAACGAAAGCGAAGGCGACAGAGAAGAGCTCAGCACAATGGTGGACATGGGCAATTACGATCTGTCTA 

GCCCTGCCCCCAGGGGACCCGATAGGCTGGAGAGAATCGAAGAGGAAGGCGGAGAGCAAGACAGAGACAGAAGCGT 

CAGGCTCGTGAATGGCAGTGAGGGCGAGGAAGTCAATAAGGGAGAGAATAACTGTCTGCTCCACCCTATGAGTCAA 

CATGGCATGGAAGACGAAGACAGAGAGGTCAATAGCGATATCAAAGTGGTCCCCAGAAGGAAAGCCAAAATCATTA 

GGGATTACGGAAAGCAAATGGCTGACGATGACTGTGTGGCCGGCTTCTCTTCCGAGCAAACAAGGGCTAACTCCCC 

TGCAAGCAGAAAGCTGGGAGACGGAGGCGGAGCCGACAGACAGGGAACAAGCTCCAGCTGTTTCAATTGCGGCAAA 

GAGGGACACATTGCCAAAAGCTGTAGGGCCCCTCGCAAGAAAGGTTGTTGGAAATGCGGAAGGGAAGGCCATCAAA 

TGAAAGACTGTACCGAAAGGCAAGCCAATTTCCTCGGCAAAATCTGGCCCTCCAAAAAAGGC^ 

TCTCCAAAGCAAATGGCTCTGGTATATCAAAATCTTTATCATGATCGTCGGTGGACTGATTGGCCTCAGGATTATC 

TTTGCCGTCCTGTCCATCATTAACGGGGCCGTGAGCCGAGACCTCGATAAACATGGCGCTATTACAAGCTCCAATA 

CCGCTGCCAATAACCCTGACTGTGTCTGGCTGGAGGCTGCTGCCATGACACCCCTGGAGATCATCGCTATCGTCGC 

CCTTATCGTCGCCCTCATCATAGCCATTGTGGTCTGGACAATCGTCTACATTGAGTATGTCGACtgaagatctgaa 

ttc 
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Bl fragment 

ggatccaccATGCTCGAGAATATGCTCACCCAAATCGGATGCACACTGAATTTCCCTATCTCCCCCATTGAGACAG 

TGCCTGTGAAACTGAAACCCGGAATGGATGGCGCCGCCACCTTTAGGCCTGGCGGAGGCAATATCAAAGACAATTG 

GAGAAGCGAACTGTATAAGTATAAGGTCGTGAAGATTAAGCCTCTGGGAATCACATGGATTCCCGAATGGGAGTTC 

GTCAACACACCCCCACTGGTCAAGCTATGGTATCAGCTGGAGAAAGACCCTATCGTTGGCGTTGAGCCTCAGGATC 

TCAACACGATGCTGAATCTTGTAGGAGGCCATCAGGCCGCTATGCAAATGCTGAAAGAGACAATCAATGAGGAAGC 

CTCTGTCCTGTTTCTGGATGGCATTGACAAAGCTCAAGAGGAACATGAAAAGTATCACTCCAACTGGAGGACAAT^ 

GCC7\ACGACTTTAATCTGATGAAGCATCTCGTCTGGGCCTCTAGG6AGCTGGAGAGATTCGCTCTGAATCCCAGCC 

TGGTGGAGACATCCGAAGGCTGTCAGGAAATTGCTGAGGAAGAGATTATCATTAGGTCCGAGAATTTCACAAACAA 

TGTCAAAACCATTATCGTCCAACTCAACGAAAGCGTCGAGATTAACATGGGCGCTAGGGCTAGTGTCCTCAGAGGC 

GGCAAGCTGGACGCCTGGGAAAAGATTAGGCTCAGGCCTGGCGGAAAGAAAAAGTATAGGCTCAAGGAGAAGGGAG 

GCCTGGAGGGACTGGTTTACTCCAAAAAGAGGCAAGACATTCTGGATCTGTGGGTGTATAACACACAGGGATTCAC 

TAGATGGGGAACCATGATCCTCGGCTTGGTGATTATCTGTAGCGCCAGCGAGAATCTGTGGGTGACAGTGTATTAC 

GGAGTGCCTGTGTGGAGGAGACAGCTCCTGTCCGGCATTGTGCAACAACAAAATAACCTCCTGAGGGCTATCGAAG 

CCCAACAGCATCTGCTCCAGCTCACCGTCTGGGTCAGGCATTTCCCCAGGCCTTGGCTCCACGGCCTGGGACAGTA 

CATCTATGAGACATACGGAGACACATGGGCGGGAGTGGAAGCCCTCACAGCCCTCATCACACCCAAAAAGATTAGG 

CCTCCCCTCCCATCCGTGAAAAAGCTCACCGAAGACAGATGGAATGAGCCTCAAAAGACATATAGCGCTGGCGAAA 

GGATTATCGATATCATTGCATCCGACATTCAGACTAAGGAACTGCAAAAGCAAATCCTAAAGATTCAGAATTTCGC 

TGTGTTTATCCATAACTTTAAQAGGAAGGGAGGCATTGGCGGCTACTCCGCCGGAGAGAGAATCATTGACATTATC 

GCCACCGATATCATTCCCGTGGGCGAAATCTATAAGAGATGGATCATTCTGGGACTCAACAAAATCGTGAGAATGT 

ATCTACCCGTCAGCATTCTGGATATCAGAGTGAGACAGGGATACTCCCCCCTCAGCTTTCAGACACTGCTGCCCGC 

TCCCAGAGGCCCTGACAGACTCGGAGGCATTGAGGAAGAGTCCAGCCAGGACCATCAGTATCCCATTCCCGAACAG 

CCTCTGCCTCAGACAAGGGGAGACi^TCCCACAGACCCTAAGGAAAGCAAAAAGGCTAGTGGAGGGGTCGAGTCCA 

TGAATAAGGAACTGAAAAAGATTATCGGACAGGTCAGGGACCAGGCTGAGCACCTGAAAACCGCTGTGCAAATGG 

TGCCATGCAGATGCTCAAGGATACCATTAACGAAGAGGCTGCCGAGTGGGACAGAGTCCATCCCGTCCATGCCGGG 

CCCGTTCCCCCTCTCACCGAGATTTGTAAAGAAATGGAAAAAGAAGGCAAAATCTCCAAGATTGGCCCTGAGAATC 

CCTATAACACACCC^TGTTTGCGATTCAAGTGAGAGAGCAAGCGGAACACCTCAAGACAGCCGTC^ 

CTTCATTCACAATTTCAAAAGGAGAGGCGGAATCGGAGGCAAAAAGAAAGATAGCACAAAGTGGAGGAAACTGGTA 

GACTTTAGGGAGCTCAACAAACGTACACAGGATTTCTGGGAGGTCCAGCTCGGCTTTTTGGCTCTGGCTTGGGATG 

ACCTCAGGAGCCTGTGTCTGTTCAGCTATCACAGACTGAGAGACTTTATCGTCATCGTTGCCAGAATGTGCCGACA 

TAGCAGAATCGGCATCACTAGGCAACGTAGAGGTAGGAACGGCGCCTCCAGTTCCGCTGCCCCCAAAATCTCCTTC 

GACCCCATTCCCATTCACTATTGCGCTCCCGCTGGCTTCGCTATCCTCAAGTGTAACGATAAGAACTTCAATGGCG 

AAGAGGATTGGCATCTGGGACAGGGAGTGTCCATCQAATGGAGACAGAAAAGCTATAGCACACAGGTG 

CCTCGCCGATCAGCCTAGCCTCTATCCTCCCTTAGCTTCCCTGAAAAGCCTCTTCGGAAACGATCCCTTATCCCAA 

GCCGCTAGAAGGGCTATCCTCGGCCATATAGTCAGGAGAAGGTGTGAGTATCAGTCCGGACACAATAAGGTCGGCT 

CCCTGCAATACCTCGCACTCAGTCAACCCACAACCGCTTGCTACAAGTGTTACTGTAAGAAATGTTGCTTCC^ 

TCAGGTCTGCTTCCTGAAGAAGGGACTGGGAATCAGGGATTACGGAAAGCAAATGGCTGGCGATGACTGTGTGGCC 

AGCAGGCAAGACGAAGACGCAGCCAAGTACCATAGCAATTGGAGAACCATTGGCAATGAGTTTAACCTCCCCCCTA 

TCGTCCCTAAGGAAATCGTCGCAAATTGCAATAAGTGTAACGAATGGACACTGGAACTGCTGGAGGAACTGAAACA 

TGAAGCCGTGAGACACTTTCCCAGACCCTGGCTGCATGGCCTCGGTCAACACGATATCATTAGCCTCTGGGATCAG 
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TCCCTGAAACCCTGTGTGAAACTGACACCCCTCTGCGTCACCCTCAACTGTACCAATGCCAATCTGATGAAGAGAT 

ACTCCACCCAAGTGGACCCCGATCTGGCTGACCAACTGATTCACCTCCACTATTTCGATTGCTTTGCCGATAGCGC 

AATCCATCCCATCGGCCAACACGGAATGGAGGATGAGGATAGGGAAGTGCTGAAATGGAAATTCGATAGCCATCTG 

GCTCTCAGGCATATCGCTTCTAGTCCTATCGATACCGTCCCCGTCAAGCTCAAGCCTGGCATGGACGGACCCAAAG 

TGAAACACTGGCCCCTCACCGAAGAGAAAATCAAAGCCATTTGGCCTAGCAACAAGGGAAGGCCTGGC^ 

GCAGTCCAGGCCTGAGCCTACCGCACCCCCAGCCGAGAGCTTTAGATTCGGCATTAGCAAAAAGGCTAAGGGATGG 

TTTTACAGACACCATTACGATAGCCGACACCCTAAGGTCAGCTCCGAGGTCCACATTCCCCTCGGCATGATGACCG 

CTTGCCAAGGCGTCGGCGGACCCAGTCACAAAGCCAGGGTACTGGCAGAGGCTATATCCCAGGTGAACAACGCTAA 

CATTCCTCCCATTGTGGCCAAAGAGATTGTGGCAAACTGTGACAAATGCCAGCTGAAGAGTGAGGCTATTCACGGA 

CAGGTGAACTGTAGCCCTTCCGAGGGAACAAGACAGACTAGGAAGAACAGACQTAGAAGGTGGCGTGCGAGGCAAA 

GGCAAATCCACTCCATCTCCGAGAGGATTCTGGGACAGATGAGGGAACCCAGAGGCTCCGACATTGCCGGTACTAC 

AAGCACACTGCAAGAGCAAATCGCATGGATGACAAGCAATCCCCCTAGCATTCAACAAGAGTTTGGCATTCCCTAT 

AACCCTCAGTCCCAGGGCGTCGTGGAAAGCATGAACAAAGAGCTAAAGAAAATCATTGGCAGACAGGAGATCCTCG 

ATCTCTGGGTGTACCATACCCAAGGCTATTTCCCTGACTGGCAGAATTACACACCCGGACCCGGAGTCAGATACCC 

TAGCAGAGAAAGACAGAGACAGATTCATTCTATTAACGAATGGATTCTCAGCAACTGCCTCGGCAGATCCGCTGAG 

CCTGTGCCTCTGCAACTGTATAAGACACTGAGAGCCGAACAGGCTACCCAAGAGGTCAAGAATTGGATGACCGAGA 

CACTGCTCGTGCAAAACGCTAACCCTGACTGTGAGAGAGTGTATCTGGCTTGGGTCCCCGCTCATAAAGGCATTGG 

CGGAAACGAACAGGTGGACAAACTGGTCAGCGCTGGCATTAGGAAAACAGACCCTAACCCTCAGGAAATCCATCTG 

GAAAACGTCACCGAGAACTTTAACATGTGGAAAAACGATATGGTGGAGCAAATGCATGAGGCTGGCTATGCCATTC 

TGAAATGCAATAACAAAAGGTTCAACGGAACTGGACCCAGTAAGAATGTGTCCACCGTCCAGTGTACCCATGGCCT 

AGAGCTCAAGAATAGCGCTATCTCCCTGCTCAACGCTACCGCTATCGCTGTGGCTGGGTGGACCGATAGGGTTATC 

GAAGTGGTTCAGTCCCGGCATCCCAAAGTGTCCAGCGAAGTGCATATCCCTCTGGGAGACGCTAGGCTCATCATTA 

GGACATACTGGGGCCTCCACACAGGCGCTGCTATGGGCGGTAAATGGTCCAAGTGCTCCCTCGTCGGATGGCCCGC 

AGTGAGAGAGAGAATCAGACAGACACCCCCTGCCGCTGAGGGAGTGCTCAAGACCGGCAAGTACTCTAGGAAGAGG 

GGTGCGCATACCAATGACGTCAAGCAACTGACAGAGGCTGTGCAAAAGATTGCCACAGAGTCTAGCTGGGAGGGTC 

TGAAATACTGGGGGAATCTGCTCCAGTACTGGGGCCAGGAACTGAAAATCTCCGCCGTCAGCCTCCTGAATGCCAC 

AGCCATTGAGCTGCCTGAGAAAGAAAGCTGGACCGTCAACGATATCCTiAAAGCTCGTGGGAAAGCTCAACTGGGCA 

TCCCAGATTTACCCCGGAAGAGCCATTGAGGCTCAGCAACACATGCTGCAACTGACAGTGTGGGGCATTAAGCAAC 

TGCAAGCCAGAGTGCTCGCCATTGAGAGATACCTCGCCCTCCAGGATAGCGGATTGGAAGTGAATATCGTCACCGA 

TAGCCAATACGCTCTAGGCATCATTCAGGCTCAGCCTGACAAAAGCGAAAGGGAAATCTCCAACTATACCAATCAG 

ATTTACAA6ATCCTCACCGAATCTCAAAATCAACAGGATAGGAATGAGAAAGACCTCCTGGCTCCCACAAAGGCTA 

AGAGAAGGGTCGTGCAAAGGGAAAAGCGTGCCGTCGGCATTGGCGCTATGTTTCTCGGATTCCTCGGCGCTGCCAA 

ACCCAAAATGATCGGAGGCATTGGAGGCTTTATCAAAGTCAGGCAGTATGACCAAATCCTTATCGAAATCTGTGGA 

AACAAGGCTATCTCCTACCATAGGCTCAGGGATTTCATTCTGATCGTCGCTAGGATTGTGGAACTGCTCGGCCGTA 

GCTCCCTGAAAGGCCTCCAGAGAGGCACACTGAATGCCTGGGTGAAAGTGATTGAGGAAAAGGGATTCAGTCCCGA 

AGTGATTCCCATGTTTTCCGCTCTGTCCGAGGGAGCCACACTCGAGtgaagatctgaattc 
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B2 fragment 

ggatCcaccATGCTCGAGAATATGCTCACCCAAATCGGATGCACACTGAATTTCCCTATCTCCCCCATTGACACAG 
TGCCTGTGAAACTGAAACCCGGAATGGATGGCGCCGCCATCTTTAGGCCTGGCGGAGGCAATATGAAAGACAATTG 
GAGAAGCGAACTGTATAAGTATAAGGTCGTGAAGATTAAGCCTCTGGGAATCACATGGATTCCCGAATGGGAGTTC 
GTCAACACACCCCCACTGGTCAAGCTATG6TATCAGCTGGAGAAAGAGCCTATCGTTG6CGCTGAGCCTCAGGATC 
TCAACACGATGCCGAATACTGTAGGAGGCCATCAGGCTGCTATGCAAATGCTGAAAGACACAATCAATGAGGAAGC 
CGCTGTCCTGTTTCTGGATGGCATTAACAAAGCTCAAGAGGAACATGAGAAGTATCACTCCAACTGGAGGACAATG 
GCCAACGACTTTAATCTGATGAAGCATCTCGTCTGGGCCTCTAGGGAGCTGGAGAGATTCGCTCTGAATCCCGGCC 

''"rjGAGACATCCGAAGGCTGTAAGCAAATTGCTGAGGAAGAGATTATCATTAGGTCCGAGAATTTCA 
TGfCAAAACCATTATCGTCCACCTCAAOSAAAGCGTCGAGATTAACATGGGCGCTAGGGCAAGTGTCCTCAGCTC 
GGCAAGCTGGACGCCTGGGAAAAGATTAGGCTCAGGCCTGGCGGCAAGAAAAAGTATAGGCTCAAGGAGAAGGGAG 
GCCTGGACGGACTGATTTACTCCCAAAAGAGGCAAGACATTCTGGATCTGTGGGTGTATAACACACAGGGATTCAC 
TAGATGGGGAACCTTGATCCTCGGCTTGGTGATTATGTGTAGCGCCAGCGAGAATCTGTGGGTGACAGTGTATTAC 
GGAGTGCCTGTGTGGAGGAGACAGCTCCTGTCCGGCATTGTGCAACAGCAAAATAACCTCCTGAGGGCTATCGAAG 
CCCAACAGCATCTGCTCCAGCTCACCGTCTGGGTCAGGCATTTCCCCAGGCCTTGGCTCCACAGCCTGGGACAGTA 
CATCTATGAGACATACGGAGACACATGGTCGGGAGTGGAAGCCCTCAAAGCCCTCATCAAACCCAAAAAGATTAAG 
CCTCCCCTCCCATCCGTGAAAAAGCTCACCGAAGACAAATGGAATAAGCCTCAAAAGACATATAGCGCTGGCGAAA 
GGATTGTCGATATCATTGCAACCGACATTCAGACTAAGGAACTGCAAAACCAAATCATAA^ 

TGTGTTTATCCATAACTTTAAGAGGAAGGGAGGCATTGGCGGCTACTCCGCCGGAGAGAGAATCATTGACATTATC 

GCCAGCGATATCGTTCCCGTGGGCGATATCTATAAGAGATGGATCATTCTGGGACTCAACAAAATCGTGAGAATGT 

ATTCACCCGTCAGCATTCTGGATATCAGAGTGAGACAGGGATACTCCCCCCTCAGCTTTCAGACACTGATGCCCGC 

TCCCAGAGGCCCTGACAGACTCGAACGCATTGAGGAAGAGTCCAGGCyVGGACCaVTCAGTATCCCATTTCCGA^ 

CCTCTGTCTCAGACAAGGGGAGACAATCCCACmGACCCTAAGGAAAGCAAAAAGGCTAGTGGAGTGGTCGAGTCCA 

TGAATAAGGAACTGAAAAAGATTATCGGACAGOTCAGGGACCAGGCTGAGCACCTGAAAACCGCTGTGCAAATGGC 

TGCCATGCAGATGCTCAAGGATACCATTAACGAAGAGGCTGCCGAGTGGGACAGAATCCATCCCGTCCATGCCGGA 

CCCATTGCCCCTCTCACCGAGATTTGTAAAGAAATGGAAAAAGAAGGCAAAATCTCCAGGATTGGCCCTGAGAATC 

CCTATAACACACCCGTCTTTGCCATTCAAGTGAGAGACCAAGCCGAACACCTCAAGACAGCCGTCCAGATGGCAGT 

CTTCATTCACAATTTCAAAAGGAAAGGCGGAATCGGAGGCAAAAAGAAAGATAGCACaAAGTGGAGGAAACTGGTT 

GACTTTAGGGAGCTCAACAAACGTACACAGGATTTCTGGGAGGTCCAGCTCGGCTTTTCGGCTCTGGCTTGGGATG 

ACCTCAGGAGCCTGTGTCTGTTCAGCTATCACAGACTGAGAGACTTTATCCTCATCGTTGCCAGAACCTGCCGACA 

TAGCAGAATCGGCATCACTAGGCAACGTAGAGGTAGGAACGGCTCCTCCAGGTCCGCTGCCCCCAAAATCTCCTTC 

GACCCCATTCCCATTCACTATTGCGCTCCCGCTGGCTTCGCTATCCTCAAGTGTAACAATAAGACATTCAATGGCG 

AAAAGGATTGGCATCTGGGACAGGGAGTGTCCATCGAATGGAGAAAGAAAAGCTATAGCACACAGGTGGACCCTO 

CCTCGCCGATCAGCCTAGCCTCTATCCTCCCTTAGCTTCCCTGAAAAGCCTCTTCGGAAACGATCCCTCATCCCAA 

GCCGCTAGAAGGGCTATCCTCGGCCAAATAGTCAGGAGAAGGTGTGAGTATCAGTCCGGACACAATAAGGTCGGCT 

CCCTGCAATACCTTGCACTCSVGCCAACCCAAAACCGCTTGCTACAAGTGTTACTGTAAGAAATGTTGCTACCACTG 

TCAGGTCTGCTTCCTGAAGAAGGGACTGGGAATCAGGGATTACGGAAAGCAAATCGCTGGCGCTGACTGTGTGGCC 

AGCAGGCAAGACGAAGACGCAGCCAAGTACCATAGCAATTGGAGAACCATGGCCAGTGAGTTTAACCTCCCCCCTA 

TCGTCGCTAAGGAAATCGTCGCAAGTTGTGATAAGTGTAACGAATGGACACTGGAACTGCTGGAGGAACTGAAACA 

TGAAGCCGTGAGACACTTTCCCAGACCCTGGCTGCATGGCCTCGGTCAACACGATATCATTAGCCTCTGGGATCAG 
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TCCCTGAAACCCTGTGTGA?\ACTGACACCCCTCTGCGTCACCCTCAACTGTACCAATGCCAATCTGCTGAAGAGCT 

ACTCCACCCAAGTGGACCCCGATCTGGCTGACCATCTGATTCACCTCCACTATTTCGATTGCTTTTCCGATAGCGC 

AATCCATCCCATGGGCCTACACGGAATGGAGGATGAGGAAAGGGAAGTGCTGAAATGGAAATTCGATAGCCATCTG 

GCTCTCAGGCATATCGCTTCTAGTCCTATCGATACCGTCCCCGTCAAGCTCAAGCCTGGCATGGACGGACCCAAAG 

TGAAACAGTGGCCCCTCACCGAAGAGAAAATCAAAGCCATTTGGCCTAGC?^CAAGGGAGGGCCTGGCAATTT^ 

GCAGTCCAGGCCTGAGCCTACCGCACCCCCAGCCGAGAACTTTAGATTCGGCATTAGCAAAAAGGCTAAGGGATGG 

TTTTACAGACACCATTACGAAAGCCAACACCCTAAGGTCAGCTCCGAGGTCCACATTCCCCTCAGCATGATGACCG 

CTTGCCAAGGCGTCGGCGGACCCAGTCACAAAGCCAGGGTACTGGCAGAGGCTATGTCCCAGGTGAACAACGCTAA 

CATTCCTCCCATTGTGCCCAAAGAGATTGTGGCAAACTGTGACAAATGCCAGCTCAAGGGTGAGGCTATGCACGGA 

CAGGTGGACTGTAGCCCTTCCGAGGGATGAAGACAGGCTAGGAAGAACAGACGTAGAAGGTGGCGTGAGAGGCAAA 

GGCAAATCCGCGCCATCTCCGAGTGGATTCTGGGACAGATAAGGGAACCCAGAGGCTCCGACATTGCCGGTACCAC 

AA.GCACACTGCAAGAGCAAATCGCATGGATGACAAACAATCCCCCTGGCATTAAGCAAGAGTTTGGCATTCCCTAT 

AACCCTCAGTCCCAGGGCGTCGTGGAAAGCATGAACAAAGAGCTCAAGAAAATCATTGGCAGACAGGAGATCCTCG 

ATCTCTGGGTCTACAATACCCAAGGCTTTTTCCCTGACTGGCAGAATTACAGACCCGGACCCGGAATCAGATACCC 

TAGCAGAGCAAGACAGAGACAGATTCATGCTATTAGCGAAAGGATTCTCAGCAACTTCCTCGGCAGACCCGCTGAG 

CCTGTGCCTCTGCAACTGTATAAGACACTGAGAGCCGAACAGGCTACCCAAGAGGTCAAGAATTGGATGACCGACA 

CACTGCTCGTGCAAAACGCAAACCCTGACTGTGAGAAAGTGTATCTGGCTTGGGTCCCCGCTCATAAAGGCATTGG 

CGGAAACGAACAGGTGGACAAACTGGTCAGCGCTGGCATTAGGAAAACAGACCCTAACCCTCAGGAAATCGATCTG 

GAAAACGTCACCGAGAACTTTAACATGTGGAAAAACAATATGGTGGAGCAAATGCAAGAGGCTGGCTATGCCATTC 

TGAAATGCAATAACAAAAAGTTCAACGGAACTGGACCCTGTAAGAATGTGTCCACCGTCCAGTGTACCCATGGCCT 

AGAGCTCAAGAATAGCGCTGTCTCCCTGCTCAACGCTACCGCTATCGCTGTGGCTGAGTGGACCGATAGGGTTATC 

GAAGTGGTTCA6TCCCAGCATCCCAAAGTGTCCAGCGAAGTGCATATCCCTCTGGGAGACGCTAGGCTCGTCATTA 

AGACATACTGGGGCCTCCACACAGGCGCTGCTATGGGCGGTAAATGGTCCAAGTGCTCCCTCGTCGGATGGCCCGC 

AGTGAGAGAGAGAATCAGACA6ACACCCCCTGCCGCTGAGGGAGTGCTCAAGACCGGCAAGTACTCCAGGATGAGG 

AGTGCCCATACCAATGACGTCAAGCAACTGACAGAGGTTGTGCAAAAGATTGCCACAGAGTCTAGCTGGGAGGGTC 

TGAAATACTTGTGGAATCTGCTCCTGTACTGGGGCCTGGAACTGAAAAACTCCGCCGTCAGCCTCCTGAATGCCAC 

AGCCATTGTGCTGCCTGAGAAAGAAGGCTGGACCGTCAACGATATCCAAAAGCTCGTGGGAAAGCTCAACTGGGCA 

TCCCAGATTTACGCCGGAAGAGCCATTGAGGCTCAGCAACACTTGCTGCAACTGACAGTGTGGGGCATTAAGCAAC 

TGCAAGCCAGAGTGCTCGCCATTGAGAGATACCTCGCCCTCCAGGATAGCGGATCGGAAGTGAATATCGTCACCGA 

TAGCCAATACGCTCTAGGCATCATTCAGGCTCAGGCTGACAAAAGCGAAAGGGAAATCTCCAACTATACCAATCAG 

ATTTACAAGATCCTCACCGAATCTCAAAATCAACAGGATAGGAATGAGCAAGAACTCCTGGCTCCCACAAAGGCTA 

AGAGAAGGGTCGTGCAAAGGGAAAAGCGTGCCGTCGGCATTGGCGCTATGTTTTTCGGATTCCTCGGCGCTGCCAA 

ACCCAAAATGATCGGAGGCATTGGAGGCTTTATCAAAGTCAGGCAGTATGACCAAATCCTTATCGAAATCTGTGGA 

CAGAAGGCTATCTCCTACCATAGGCTCAGGGATTTCATTCTGATCGTCGCTAGGATTGTGGAACTGCTCGGCCATA 

GCTCCCTGAGAGGCCTCCGGAGAGGCACACTGAATGCCTGGGTG2Wi.GTGGTTGAGGAAAAGGGATTCAATCCCGA 

AGTGATTCCCATGTTTACCGCTCTGTCCGAGGGAGCCACACTGGAGtgaagatctgaattc 
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Cl fragment 

ggatccaccATGCTCGAGAGCAACACACCCGCTAATAATGCCGATTGCGCGTGGCTGAAAGCCCAGGAAGAGGAAG 

AAGTGGGATTTCCTGTGAGACCCCAAGTGCCTAGAGCTTGGAGGGCTATCCTCAACATTCCCAGGAGGATTAGGCA 

AGGCTTTGAGAGAGCCCTCCTAGCCGCCGAATGGGACAGGGTTCACCCTGTGCACGCTGGCCCTGTCGCTCCCGGC 

CAAATGAGAGAGCCCAGAGGAAGCGATATCGCTGGCACAACCCTCAGGCCCATGACATATAAGGCCGCTATTGACC 

TCAGCTTGTTTCTGAAAGAGAAAGGCGGACTGGAAGGCCTCATCTATAGCAAGAAAGCTGCTATGGAACAGGCTCC 

CGAAGACCAAAGCCCTCAGAGAGAGCCTTACAATGAGTGGACCCTGGAGCTCCTGGAAGAGCTCAAGAAAGAGGCT 

CAAGGCCAATGGACCTACCAAATCTTTCAGGAACCCTTTAAGAATCTGAAAACCGGAAAGTATTCCAGAATGAGA^ 

GCGCTCACACaAACTGGATGACAGAAACCCTCCTGGTCCAGAATGCCAATCCCGATTGCAAGTCCATCC^^ 

TCTGGGAACCGGAGCCACACTGGAAGAGCCTGAGGTCATCCCTATGTTCTCAGCCCTCAGCGAAGGCGCTACCCCC 

CAAGACCTGAATACGATGCTCAACATCGTCAGCGGACACCAATCCACCCTCCAGGAACAGATTGGCTGGATGACAA 

ATAACCCTCCCATCCCTGTCGGAGAGATTTACAAAAGGTGGATTATCCTCGGCCTGACTAGAATCCCCCATCCCGC 

CGGCCTCAAGAAAAAGAAAAGCGTCACCGTCCTGGATGTGGGAGACGCTTACTTCAGCGTCCCCCTCGACGAAGAC 

CAAAAGGAAACCTGGGAGGCTTGGTGGACGGAATACTGGCAGGCTACCTGGATTCCTGAGTGGGAGTTTGTGAATA 

CCCCTCCCCTCGTGTTTCCCGATTGGCATAACTATACCCCTGGCCCTGGCATAAGGTATCCCCTCACCTTTGGATG 

GTGCTTTAAGCTCGTGCCTGTGGACCCCA?^CTGTGGTACCAACTGGAAAAGGAACCCATTGTCGGAGCCGAAACC 

TTTTACGTGGACGGAGCCGCCAACAGAGAGACAAAGCTCGGCCAAAACGTCCAGGGACAGATGGTGCATCAGGCTA 

TTAGCCCCAGGACCCTCAACGCTTGGGTCAAGGTCGTCGAAGAGAAAGCCTTTAACGAAACCGAAGTGCATAACGT 

CTGGGCTACCCATGCCTGTGTGCGTACCGATCCCAATCCCCAAGAGATTCTCCTGGAGAATGTGACAGAGCTCAAG 

GATCAGAAACTCCTCGGCATTTGGGGATGCTCCGGCAAAATCATTTGCACAACCACTGTGCCTTGGAACAGCTCCT 

GGTCCAACCAAGCTGGCCATAACAAAGTGGGAAGCCTCCAGTATCTGGCTCTGACGGCTCTGATTAAGCCTAAGAA 

AATCAAACCCCCTCTGCCTAGCGTTAAGACAATCATTGTGCATCTGAATGAGTCCGTGGAAATCAA^ 

CCTAACAATAACACAAGGAAAGCCGCCGCTAGTGAAGTACGGAATAAGTCCAAA 

CCGATACAGGCGACTCCAGCCAGGTO^GCCAAAACTATCCCATTGTGTCCS^CTTTACCTCCACCACTGTGAAAG 

CGCTTGTTGGTGGGCCAATATCAAACAGGAGTTTGGAATCCCTTACAATCCCCAAAGCCAAACATTCTATGTGGAT 

GGCGCTGCCAATAGGGAAACCCAACTGGGAAAGGCGGGCTATGTGACAGACAAAGGCAGACAGAAAGTCATTAGCG 

GAATCTGGCSVGCTCGACTGTACCCATCTGGAAGGCAAAGTCATTCTGGTAGCCGTCCACGTCGCCTCCGGCTACAT 

TGAGGCTGAGGTCGGCAATGAGCAAGTGGATAAGCTCGTGAGTTCCGGAATCAGAAAGGTGCTATTCCTCGACGGA 

ATCAATAAGGCTCAGGAAGAGCACGAAGTCAGGGAAAGGATTAGGCGAACCGCTCCCGCTGCTGAAGGCGTCGGCG 

CTGTCTCCCAGGATCTGGATAAGTACGGAGCCCTCACCTGCACAAGCGGAACCCAACAGTCCCAGGGAACTGAAAC 

TGGCGTCGGCAACCCTCAGATTTTGGGAGAGTCCAGCGTTGTCCTCGGCTCCGGCTCCATCGTCATCTGGGGTAAA 

ACCCCTAAGTTTAAGTTCCCCATTCAGAAAGAGACATGGGAAGCCTGGTGGACGGAGTATTGGCAAGCCGCTGCTT 

ACAGACTGATCAGCTGTAACACAAGCGTTATCAAACAGGCTTGCCCTAAGATTACCTTTGACCCTATCCCTATCCA 

TTACTGTGCCCCTCCTAGCTGGATGGGCTATGAGCTCCACCCTGACAGATGGACAGTGCAACCCATCGTGCTCCCC 

GAAAAGGACTCCTGGACAGTGAATGACATTCAGAAATCAATTCTGAGAGCCCTCGGCCCAGGCGCTTCCCTGGAGG 

AAATGATGACAGCATGTCAGGGAGTGGGAGGCCCTGGCCATAAGGCTAGAGTGTATTACAGAGACTGCAGGGACCC 

CATTTGGAAAGGCCCTGCCAAACTGCTCTGGAAAGGCGAAGGCGCTGTGGTCATCCAAGACATTAAGATT 

CAACTGATAGAAGCCCTCCTGGATACAGGAGCCGATGACACCGTCCTGGAAGATATGAATCTGCCTGGCAAGTGGG 

GAATCAAACAGCTCCAGGCTAGGGTCCTGGCTATCGAGAGGTATCTGAAAGATCAACAGTTTCTGGGACTCTGGGG 

CTGTAGCGGAAAGGCTGCTATGGAAAACAGATGGCAAGTGATGATCGTCTGGCAAGTGGACAGGATGAAGATTAGG 
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ACATGGAATAGCCTCGTGAAACACCATATGTATATTATCTGTACCACAACCGTCCCCTGGAACTCCACCTGGAGCA 

ATAAGTCCTTGGAAGAGATTTGGAATAACATGACCTGGATTCAATGGCTGATTCTCGCTATCGTCGTGTGGACCAT 

TGTGTATATCGAATACAAGAAACTGCTCAGGCAAAGGAGAATCGATAGGCTCATCAAAAGGCTCAACCCTGGCCTC 

CTGGAAACCGCTGAGGGATGTAAACAGATCCTGGAACAGCTCCAGCCCGCCCTCCAGACAGGCACCGAAGAGCTCT 

CTAGTAGAAAGCTCCTGAAACAGAGAAAGATTGACAGACTGATTGAGA6AATCAGAGAGAGAGCCGAAGACTCCGG 

CAATGAGTCCGAGGGAGACACACCCGGAATCAGATACCAATACAATGTGCTCCCCCAAGGCTGGAAGGGCTCCCCA 

CCCATTTTCCAAAGCTCCATGACCCAAATCCTCATGATGCAAAGGGGAAACTTTAAGGGACAGAAAAGGATTATCA 

AGTGCTTCAACTGTGGAAAGGAAGGCCATCTCGCTAGGAATTGCAGACCTCCCCTAGAGAGACTGAACCTGGATTG 

CTCCGAGGATAGCGACACCTCCGGCACACAGCAAAGCCAAGGCACAGAGACAGAAGTGGGACTCGTGGCTGTGCAT 

GTGGCCAGCGGATATATCGAAGCCGAAGTGATCCCTGCCGAAACTGGACAGGAAACCGCTTACTTTATCCTCAAGA 

TTAAGGCTGTGGTCAGCACACAGCTCCTGCTCAACGGTAGCCTCGCTGAAGAGGAAATCATTATCAGAAGCGAAAA 

CTTTACCGATAACAAACTGGTCGGCAAACTGAATTGGGCTTCCGAAATCTACGCTGGCATCAAAGTGAAGCAACTG 

TGTAAGCTCCTGAGAGGCACCAAAGCCCTCACTCCTCTGTGTGTGACACTGAATTGCACAAACGCTAACCTCATCA 

ATGTGAATGCTGCTCAAACCAGAGGCGATAACCCTACCGGTCCCGAAGAGTCCAAGAAAGAGGTCGCGTCCAAGAC 

AGAGACAGACCCTTGTGACGCCGCCCCTAGCTCCAACTTTCTGGGAAGGTCTGCCGAACCCGTCCCCCTCCAGCCC 

CCCCCTCTGGAAAGGCTCCACCTCGACTGTAGCGAAGACTGTGGCGAACTGGATAAGTGGGCCTCCCTGTGGAACT 

GGTTCAATATCACCAACTGGCTGTGGTACATTAAGATTTTCATTATGATTGTGGGAGGCAATAAGATTGTCAGGAT 

GTACTCACCTGTCTCCATCCTCGACATTAAGCAAGGCCCTAAGGAACCCTTCAGGGATTACGTGGACAGATTCGCT 

AAGCTCOTGTGGAAGGGAGAGGGAGCCGTCGTGATTCAGGACAACTCCGACATTAAGGTCGTGCCCAGGAGAAAGG 

CTAAGATTATCGAACTGAATAAGAGAACCCAAGACTTTTGTGAAGTGCAACTGGGAATCCCTCACCCTGCTGGACT 

GAAGAAGAAAAAGTCAGTGACAGTGGCCGCTATGAGAGTGAAAGAGACACAGATGAACTGGCCCAATCTGTGGAAG 

TGGGGCACAATGATTCTGGGACTGGTCATCATTTGCTCCGCCTCCATTAAGGTCAGACAGCTCTGCAAACTGCTCA 

GGGGTACAAAGGCTCTGACAGAGATTGTGACACTGACAGAGGAAGCCGAACTGGAACTGCTCATATGGAAGTTTGA 

CTCCCGCCTCGCCCTGAGACATATCGCCAGGGAACTGCATCCCGAGTTCTACAAAGACTGCGCTGCTGTCGAGCTC 

CTGGGACGCTCCAGCCTCAAGGGACTGCAAAGGGGATGGGAAGGCCTCAAGTATTTGTGGAACCTCCTGCAGTATT 

GGGGCTCTAGCCTGGGGCAACTGCAACCTGCTCTGAAAACCGGATCAGAGGAACTGAAGTCCCTGTATAACACAAT 

CGCTACCCTCTGGTGTGTGCATCAGGAGCTCTACAAATACAAAGTGGTCAAAATC^ 

ACCAGAGCCaAAAGGAGAGTGGTCGAGAGAGAGAAAAGGCTCACCGAAATCGTCCCACTCACCGAAGAGGCTGAGC 
TGGAGCTGGAGGAAAACAGAGAGATTCTGAGGGAACCCGTCCACGGAGTGTATAGAGTGCTCGCCGAAGCCATGAG 
CCAAGTCAACAATGCCAACATCATGATGCAGAGAGGCAATTTCAAAGGCCTAAAGAGAATCATCAAACAAGAGGAA 
GAGGAGGTCGGCTTCCCCGTCAGGCCCCAGGTCCCACTGAGACCTATGACCTACAAAGGAGCCGTCGATCTGTCCT 
TCTTCAGACAGGGACCCAAAQAGCCTTTCA6AGACTATGTGGATAGGTTTTTCAAAACGCTCAGGGCTGAGCAAGC 
CTCACAGGAAGTGAAAAACTGGGAGAAAATCAGACTGAGACCTGGTGGCAAAAAGAAATACAAAATGAAACACATT 
GTGTGGGCCTCCAGGGAACTGGAAAGGTTTGCCTCCCAGTATGCCCTCGGCATCATCGTAGCCCAACCCGATAAGT 
CCGAGTCCGAGCTCGTGAATCAGATTATCGAAGAGCTCATCAAGAAGATTGCCGTCGCCGGATGGACAGACAGAAT 
CATTGAGGTCGACCAAAGGGCTTGGAGAGCCATTCTGAATATCCCCAGGAGAATCAGACAGACTAGACTCGCCGGA 
AGGTGGCCCGTCAGGACAATCTATACCGATAACGGAAGCAATTTCACAAGCGCTACCGTCAA6GCTGCCTGCTGGT 
GGGCTGATGTGAAACAGCTCACCGCAGTCGTCCAGAAAATCGCTACCGAAAGCATTGTGATATGGGGAAAGACGGC 
CAAGTTCAGACTGCCTATCGCTGCCGCCAGCAACGAGAACATGGAGACCATGGCTGCTtgaagatctgaattc 
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C2 fragment 

ggatccaccATGCTCGAGAGCAACACAGCCGCTAACAATACCGATTGCGTGTGGCTGAAAGCCCAGGAAGAGGAAG 

AAGTGGGATTTCCTGTGAGACCCCAAGTGCCTAGAGCCGGGAGGGCTATCCTCAACATTCCCACGAGGATTAGGCA 

AGGCCTTGAGAGAGCCCTCGTAGCCGCCGAATGGGATAGGATTCACCCTGTGCACGCTGGCCCTATCGCTCCCGGC 

CAAATGAGAGAGCCCAGGGGAAGCGATATCGCTGGCACAACCCTCAGGCCCATGACATATAAGGCCGCTATTGACC 

TCAGCTTGTTTCTGAAAGAGAAAGGCGGACTGGATGGCCTCATCTATAGCAAGAAAGCTGCTATGGAACAGGCTCC 

CGAAGACCAAAGCTCTCAGAGAGAGCCTTACAATGAGTGGACCCTGGAGCTCCTGGAAGAGCTCAAGCACGAGGCT 

CAAGGCCAATGGACCTTCCAAATCTTTCAGGAACCCTTTAAGAATCTGAAAACCGGAAAGTATGCCAGA^ 

GCGCTCACACAAACTGGATGACAGATACCCTCCTGGTCCAGAATGCCAATCCCGATTGCAAGTCCATCCTC^ 

TCTGGGACCCGGAGCCTCACTGGAAGAGCCTGAGGTCATCCCTATGTTCTCAGCCCTCAGCGAAGGCGCTACCCCC 

CAAGACGTGAATATGATGCTCAACACCGTCGGCGGACACCAATCCACCCTCCAGGAACAGATTGGCTGGATGACAA 

ATAACCCTCCCATCCCTGTCGGAGAGATTTACAAAAGGTGGATTATCCTCGGCCTGACTAGAATCCCCCATCCCGC 

CGGCCTCAAGAAAAAGAAAAGCGTCACCGTCCTGGATGTGGGAGACGCTTACTTCAGCGTCCCCCTCGACGAAGGC 

CAAAGGGAAACCTGGGAGGCTTGGTGGATGGAATACTGGCAGGCTACCTGGATTCCTGAGGGGGAGTTTGTGAATA 

CCCCTCCCCTCGTGTTTCCCGATTGGCAAAACTATACCCCTGGCCCTGGCACAAGGTATCCCCTCACCTTTGGATG 

GTGCTTTAAGCTCGTGCCTGTGGACCCCAAACTGTGGTACCAACTGGAAAAGGACCCCATTGTCGGAGTCGAAACC 

TTTTACGCGGACGGAGCGGCCAACAGAGAGACAAAGCTGGGCCAAAACGTCCAGGGACAGATGGTGCATCAGCCTA 

TTAGCCCCAGGACCCTCAACGCTTGGGTCAAGGTCATCGAAGAGAAAGGCTTTAGCGACACCGAAGTGCATAACGT 

CTGGGCTACCCATGCCTGTGTGCCTACCGATCCCAATCCCCAAGAGATTCTCCTGGAGAATGTGACAGAGCTCAAG 

GATCAGAAACTCCTCGGCATTTGGGGATGCTCCGGCAAACTCATTTGCACAACCACTGTGCCTTGGAACAGCTCCT 

GGTCCAACCCAGCTGGCCATAACAAAGTGGGAAGCCTCCAGTATCTGGCTCTGAAGGCTCTGATTACGCCTAAGAA 

AATCAAACCCCCTCTGCCTAGCGTTAAGACAATCATTGTGCATCTGAATGAGTCCGTGGAAATCAATTGCACAAGG 

CCTAACAATAACACAAGGACAGCCGCCGCTAGTGAAGTACAGAATAAGTCCAGACAGAAAACCCAGCAAGCCGCCG 

CCGATACAGGCAGCTCCAGCAAGGTCAGCCAAAACTATCCCATTGTGTCCAACTTTACCTCCACCACTGtGAAAGC 

CGCTTGTTGGTGGGCCAATATCAAACAGGAGTTTGGAATCCCTTACAATCCCCAAAGCCGAACATTCTATGTGGAT 

GGCGCTGCCAATAGGGAAACCAAACTGGGAAAGGCTGGCTATGTGACAGACAGAGGCAGACAGAAAGTCGTTAGCG 

GAATCTGGCAGCTCGACTGTACCCATCTGAAAGGCAA?^GTCATTCTGGTA6CCGTCCACGTCGCCTCCGGCTACAT 

TGAGGCTGAGGTCGGCAATGAGCAAGTGGATAAGCTCGTGATTTCCGGAATCAGAAAGGTGCTATTCCTCGACGGA 

ATCGATAAGGCTCAGGAAGAGCACGAAGTCAGGGAAAGGATTAGGCGAGCCGCTCCCGCTGCTGAAGGCGTCGGCG 

CTGTCTCCCAGGATCTGGATAAGTACGGAGCCATCACCTCCACAAGCGGAACCCAACAGTCCCAGGGAACTGAAAC 

TGGCGTCGGCAACCCTCAGATTTTGGGAGAGTCCAGCGCTGTCCTCGGCTCCGGCTCCATCGTCATCTGGGGTAAA 

ACCCCTAAGTTTAAGCTCCCCATTCAGAAAGAGACATGGGAAACCTGGTGGATGGACTATTGGCAAGCCGCTGCTT 

ACAGACTGATCAGCTGTAACACAAGCGTTATCACACAGGCTTGCCCTAAGATTAGCTTTGAGCCTATCCCTATCCA 

TTACTGTGCCCCTCCTAGCTGGATGGGCTATGAGCTCCACCCTGACAGATGGACAGTGCAACCCATCGTGCTCCCC 

GAAAAGGAGTCCTGGACAGTGAATGACATTCAGAAAACAATTCTGAAAGCCCTCGGCCCAGGCGCTACCCTGGAGG 

AAAATATGACAGCTITGTCT^GGGAGTGGGAGGCCCTGGCCATAAGGCTAGAGTGTATTACAGAGACTCCAGGGACCC 

CATTTGGAAAGGCCCTGCCAAACTGCTCTGGAAAGGCGAAGGCGCTGTGGTCATCCAAGACATTAAGATTGGAGGC 

CAACTGAAAGAAGCCCTCCTGGATACAGGAGCCGATGACACCGTCCTGGAAGATATCAATCTGCCTGGCAAGTGGG 

GAATCAAACAGCTCCAGGCTAGGGTCCTGGCTATCGAGAGGTATCTGAAAGATCAACAGCTTCTGGGAATCTGGAG 

CTGTAGCGGAAAGGCTGCTATGGAAAACAGATGGCAAGTGATGATCGTCTGGCAAGTGGACAGGATGAAGATTAGG 
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ACATGGAATAGCCTCGTGAAACACCATATGTATCTTATCTGTACCACAGCCGTCCCCTGGAACTCCACCTGGAGCZV 

ATAAGTCCTTCGAAGAGATTTGGAATAACATGACCTGGATTGAATGGCTGATTATCGCTATCGTCGTGTGGACCAT 

TGTGTTTATCGAATACAAGAAACTGCTCAGGCAAAGGAAAATCGATA6GCTCATCGAAAGGCTCAACCCTGGCCTC 

CTGGAAACCGCTGAGGGATGTAAACAGATCCTGGAACAGCTCCAGCCCGCCCTCAAGGCAGGCACCGAAGAGGTCT 

CTAGTAGAAAGCTCCTGAGACAGAGAAAGATTGACAGACTGATTGAGAGAATCAGAGAGAGAGCCGAAGACTCCGG 

CAATGAGTCCGAGGGAGACACACCCGGAATCAGATACCAATACAATGTGCTCCCCCAAGGCTGGAAGGGCTCCCCA 

GCCATTTTCCAAAGCTGCATGACCAAAATCCTCATGATGCAAAGGGGAAACTTTAAGGGACAGAAAAGGATTATCA 

AGTGCTTCAACTGTGGAAAGGAAGGCCATCTCGCTAGGAATTGCAGACCTCCCCTGGAGAGACTGAACCTGGATTG 

CTCCGAGGATAGCGACACCTCCGGCACACAGCAAAGCCAAGGCACAGAGACAGGAGTGGGACTCGTGGCTGTGGAT 

GTGGCCAGCGGATATATCGAAGCCGAAGTGATCCCTGCCGAAACTGGACAGGAAACCGCTTACTTTCTCCTCAAGA 

TTAAGCCTGTGGTCAGCACACAGCTCCTGCTCAACGGTAGCCTCGCTGAAGAGGAAATCATTATCAGAAGCGAAAA 

CTTTACCAATAACAAACTGGTCGGCAAACTGAATTGGGCTTCCCAAATCTACCCTGGCATCAAAGTGAGGCAACTG 

TGTAAGCTCCTGAGAGGCACCAAAGCCCTCAGCCCTCTGTGTGTGACACTGAATTGCACAAACGCTAACCTCATCA 

ATGTGAATGCTGCTCAACCCAGAGGCGATAACCCTACCGATCCCAAAGAGTCTAAGAAAGAGGTCGCGTCCAAGGC 

AGAGACAGACCCTTTTGACGCCGCCCCTAGCTCCACCTTTCTGGGAAGGTCTGTCGAACCCGTCCCCCTCCAGCTC 

CCCCCTCTGGAAAGGCTCCACCTCGACTGTAGCGAAGACAGTGACGAACTGGATAAGTGGGGCTCCCTGTGGAACT 

GGTTCAATATCACCAACTGGCTGTGGTAGATTAAGATTTTCATTATGATTGTGGGAGGCAATAAGATTGTCAGGAT 

GTACCAACCTGTCTCCATCCTCGACATTAAGCAAGGCCCTAAGGAACCCTTCAGGGATTACGTGGACAGATTCGCT 

AAGCTCCTGTGGAAGGGAGAGGGAGCCGTCGTQATTCAGGACAACTCCGACATTAAGGTCGTGCCCAGGAGAAAGG 

CTAAGATTATCGAACTGAATAAGAGAACCCAAGACTTTTGGGAAGCGCAACTGGGAATCCCTCACCATGCTGGACT 

GAAAAAGAAAAAGTCCGTGACAGTGGCCGCTATGAGAGTGAAAGAGACACAGATGAACTGGCCCAATCTGTGGAAG 

TGGGGCACAATGATTCTGGGACTGGTCATCATTTGCTCCGCCTCCATTAAGGTCAAACAGCTCTGCAAACTGCTCA 

GGGGTGCAAAGGCTCTGATAGACATTGTGCCACTGACAGAGGAAGCCGAACTGGAACTGCTCATATGGAAGTTTGA 

CTCCCACCTCGCCCTGAGACATATCGCCAGGGAACTGCATCCCGAGTACTACAAAGACTGCGCTGCTGTCGAGCTC 

CTGGGACGCTCCAGCCTCAAGGAACTGCGAAGGGGATGGGAAGCCCTCAAGTATTTGTGGAACCTCCTGCAGTATT 

GGGGCTCTAGCCTGGAGCAACTGCAATCTGCTCTGAAAACCGGATCAGAGGAACTGAGGTCCCTGTTTAACACAGT 

CGCTACCCTCTGGTGTGTGCATCAGGAGCTCTACAAATACAAAGTGGTCAAAATCGAACCCCTCGGCATTGCCCCT 

ACCAAAGCCAAAAGGAGAGTGGTCCAGAGAGAGAAAAGGCTCACCGATATCGTCACACTCACCGAAGAGGCTGAGC 

TGGAGCTGGAGGAAAACAGAGAGATTCTGAAGGAACCCGTCCACGGAGTGTATAGAGTGCTCGCCGAAGCCATGAG 

CCAAGCCAACAATGCCAACATCATGATGCAGAGAGGCAATTTCAGAGGCCCAAAGAGAATCATCAAACAAGAGGAA 

GAGGGGGTCGGCTTCCCCGTCAGGCCTCAGGTCCCACTGAGACCTATGACCTACAAAGCAGCCATCGATCTGTCCT 

TCTTCAAACAGGGACCCAAAGA6CCTTTCAGAGACTATGTGGATAGGTTTTTCAAAACCCTCAGGGCTGAGCAAGC 

CTCACAGGAAGTGAAAAACTGGGAGAAAATCAGACTGAGATCTGGTGGCAAAAAGAAATACAAACTGAAAOACATT 

GTGTGGGCCTCCAGGGAACTGGAAAGGTTTGCCTCCCAGTATGCCCTCGGCATCATCCTAGCCCAACCCGATAAGT 

CCGAGTCCGAGCTCGTGAGTCAGATTATCGAAGAGCTCATCAAGAAGATTGCCGTCGCCGGATGGACAGACAGAGT 

CATTGAGGTCGTCCAAAGGGCTTGGAGAGCCATTCTGAATATCCCCAGGAGAATCAGACAGACTAGACTCGCCGGA 

AGGTGGCCCGTCAAGATAATCCATACCGATAACGGAAGCAATTTCACAAGCACTGCCGTCAAGGCTGCCTGCTGGT 

GGGCTGATGTGAAACAGCTCACCGAAGTCGTTCAGAAAATCGCTACCGAAAGCATTGTGATATGGGGAAAGACACC 

CAAGTTCAGACAGCCTATCGCTGCCGCCAGCAACGAGAACATGGACGCCATGGCTGCTtgaagatctgaattc 
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